1 Introduction
Recently, researchers have made an unexpected discovery that the stateoftheart object classifiers can be fooled easily by small perturbations in the input unnoticeable to human eyes [29, 8]. Following studies tried to explain the cause of the seeming failure of deep learning toward such adversarial examples. The vulnerability was ascribed to linearity [8], low flexibility [5], or the flatness/curvedness of decision boundaries [23]
, but a more complete picture is still under research. This is troublesome since such a vulnerability can be exploited in critical situations such as an autonomous car misreading traffic signs or a facial recognition system granting access to an impersonator without being noticed. Several methods of generating adversarial examples were proposed
[8, 22, 3], most of which use the knowledge of the classifier to craft examples. In response, a few defense methods were proposed: retraining target classifiers with adversarial examples called adversarial training [29, 8]; suppressing gradient by retraining with soft labels called defensive distillation
[26]; hardening target classifiers by training with an ensemble of adversarial examples [30]. (See Related work for descriptions of more methods.)In this paper we focus on whitebox attacks where the model and the parameters of the classifier are known to the attacker. This requires a genuinely robust classifier or defense method since the defender cannot rely on the secrecy of the parameters as defense. To emphasize the dynamic nature of attackdefense, we start with the following simple experiments (see Sec 3.1 for a full description). Suppose first a classifier is trained on a original nonadversarial dataset. Using the trained classifier parameters, an attacker can then generate adversarial examples, e.g., using the fast gradient sign method (FGSM) [8] which is known to be simple and effective. However, if the defender/classifier^{1}^{1}1The defender and the classifier are treated synonymously in this paper. has access to those adversarial examples, the defender can significantly weaken the attack by retraining the classifier with adversarial examples, called adversarial training. We can repeat the two steps – adversarial sample generation and adversarial training – many times, and what is observed in the process (Sec 3.1) is that an attack/defense can be very effective against immediatelypreceding defense/attack, but not necessarily against nonimmediately preceding defenses/attacks. This is one of many examples that show that the effectiveness of an attack/defense method depends critically on the defense/attack it is against, from which we conclude that the performance of an attack/defense method have to be evaluated and reported in the attackdefense pair and not in isolation.
To better understand the interaction of attack and defense in the adversarial example problem, we formulate adversarial attacks/defense on machine learning classifiers as a twoplayer continuous purestrategy zerosum game. The game is played by an attacker and a defender where the attacker tries to maximize the risk of the classification task by perturbing input samples under certain constraints, and the defender tries to adjust the classifier parameters to minimize the same risk function given the perturbed inputs. The ideal adversarial examples are the global maximizers of the risk without constraints (except for certain bounds such as the
norm.) However, such a space of unconstrained adversarial samples is very large – any real vector of the given input size is potentially an adversarial sample regardless of whether it is a sample from the input data distribution. The vastness of the space of adversarial examples is a hindrance to the study of the problem, since it is difficult for the defender to model and learn the attack class from a finite number of adversarial examples and generalize to future attacks. To study the problem more concretely, we use two representative classes of attacks. The first type is of gradienttype – the attacker uses mainly the gradient of the classifier output with respect to the input to generate adversarial examples. This includes the fast gradient sign method (FGSM)
[8] and the iterative version (IFGSM) [13]of the FGSM. Attacks of this type can be considered an approximation of the full maximization of the risk by one or a few steps of gradientbased maximization. The second type is a neuralnetwork based attack which is capable of learning. This attack network is trained from data so that it take a (clean) input and generate a perturbed output to maximally fool the classifier in consideration. The ‘size’ of the attack class is directly related to the parameter space of the neural network architecture, e.g., all perturbations that can be generated by fullyconnected 3layer ReLU networks that we use in the paper. Similar to what we propose, others have recently considered training neural networks to generate adversarial examples
[25, 1]. While the networkbased attack is a subset of the space of unconstrained attacks, it can generate adversarial examples with only a single feedforward pass through the neural network during test time, making it suitable for realtime attacks unlike other more timeconsuming attacks. We later show that this class of neuralnetwork based attacks is quite different from the the class of gradientbased attacks empirically.As a twoplayer game, there may not be a dominant defense that is robust against all types of attacks. However, there is a natural notion of the best defense or attack in the worst case. Suppose one player moves first by choosing her parameters and the other player responds with the knowledge of the first player’s move. This is an example of a leaderfollower game [2] for which there are two known equilibria – the minimax and the maximin points if it is a constantsum game. Such a defense/attack is theoretically an ideal pure strategy in the leaderfollower setting, but one has to actually find it for the given class of defense/attack and the dataset to deploy the defense/attack. To find minimax solutions numerically, we propose continuous optimization algorithms for the gradientbased attacks (Sec. 3) and the networkbased attacks (Sec. 4), based on alternating minimization with gradientnorm penalization. Experiments with the MNIST and the CIFAR10 datasets show that the minimax defense found from the algorithm is indeed more robust than nonminimax defenses overall including adversariallytrained classifiers against specific attacks. However, the results also show that the minimax defense is still vulnerable to some degrees to attacks from outofclass attacks, e.g., the gradientbased minimax defense is not equally robust against networkbased attacks. This exemplifies the difficulty of achieving the minimax defense against all possible attack types in reality. Our paper is a first step towards this goal, and future works are discussed in Sec. 5.
The contributions of this paper can be summarized as follows.

We explain and formulate the adversarial example problem as a twoplayer continuous game, and demonstrate the fallacy of evaluating a defense or an attack as a static problem.

We show the difficulty of achieving robustness against all types of attacks, and present the minimax defense as the best worstcase defense.

We present two types of attack classes – gradientbased and networkbased attacks. The former class represents the majority of known attacks in the literature, and the latter class represents new attacks capable of generating adversarial examples possessing very different properties from the former.

We provide continuous minimax optimization methods to find the minimax point for the two classes, and contrast it with nonminimax approaches.

We demonstrate our game formulation using two popular machine learning benchmark datasets, and provide empirical evidences to our claims.
For readability, details about experimental settings and the results with the CIFAR10 dataset are presented in the appendix.
2 Related work
Making a classifier robust to testtime adversarial attacks has been studied for linear (kernel) hyperplanes
[14][4] and SVM [6], which also showed the gametheoretic nature of the robust classification problems. Since the recent discovery of adversarial examples for deep neural networks, several methods of generating adversarial samples were proposed [29, 8, 12, 22, 3] as well as several methods of defense [29, 8, 26, 30]. These papers considered static scenarios, where the attack/defense is constructed against a fixed opponent. A few researchers have also proposed using a detector to detect and reject adversarial examples [18, 15, 21]. While we do not use detectors in this work, the minimax approach we proposed in the paper can be applied to train the detectors.The idea of using neural networks to generate adversarial samples has appeared concurrently [1, 25]. Similar to our paper, the two papers demonstrates that it is possible to generate strong adversarial samples by a learning approach. The former [1]
explored different architectures for the “adversarial transformation networks” against several different classifiers. The latter
[25] proposed “attack learning neural networks” to map clean samples to a region in the feature space where misclassification occurs and “defense learning neural networks” to map them back to the safe region. Instead of prepending the defense layers before the fixed classifier [25], we retrain the whole classifier as a defense method. However, the key difference of our work to the two papers is that we consider the dynamics of a learningbased defense stacked with a learningbased attack, and the numerical computation of the optimal defense/attack by continuous optimization.Extending the model of [12], an alternating optimization algorithm for finding saddlepoints of the iterative FGSMtype attack and the adversarial training defense was proposed recently in [17]. The algorithm is similar to what we propose in Sec. 3, but the difference is that we seek to find minimax points instead of saddle points, as well as we consider both the gradientbased and the networkbased attacks. The importance of distinguishing minimax and saddlepoint solutions for machine learning problems was explained in [11] along with new algorithms for handing multiple local optima. The alternating optimization method for finding an equilibrium of a game has gained renewed interest since the introduction of Generative Adversarial Networks (GAN) [7]. However, the instability of the alternating gradientdescent method has been known, and the “unrolling” method [20] was proposed to speed up the GAN training. The optimization algorithm proposed in the paper has a similarity with the unrolling method, but it is simpler and involves a gradientnorm regularization which can be interpreted intuitively as sensitivity penalization [9, 16, 19, 24, 28].
Lastly, a related framework of finding minimax risk was also studied in [10] for the purpose of preventing attacks on privacy. We discuss how the attack on classification in this paper and the attack on privacy are the two sides of the same optimization problem with the opposite goals.
3 Minimax defense against gradientbased attacks
A classifier whose parameters are known to an attacker is easy to attack. Conversely, an attacker whose adversarial samples are known to a classifier is easy to defend against. In this section, we first demonstrate the catandmouse nature of the interaction, using adversarial training as defense and the fast gradient sign method (FGSM) [8] attack. We then describe more general form of the game, and algorithms for finding minimax solutions using sensitivitypenalized optimization.
3.1 A motivating observation
Suppose is a classifier and
is a loss function. The untargeted FGSM attack generates a perturbed example
given the clean sample as follows:(1) 
Untargeted means that the goal of the attacker is to induce misclassification regardless of which classes the samples are misclassified into, as long as they are different from the original classes. We will not discuss defense against targeted attacks in the paper as they are analogous to untargeted counterparts in the paper. The clean input images we use here are normalized, that is, all pixel values are in the range . Although simple, FGSM is very effective at fooling the classifier. Table 1
demonstrates this against a convolutional neural network trained with clean images from MNIST. (Details of the classifier architecture and the settings are in the appendix.)
Defense\Attack  No attack  FGSM  

=0.1  =0.2  =0.3  =0.4  
No defense  0.026  0.446  0.933  0.983  0.985 
On the other hand, these attacks, if known to the classifier, can be weakened by retraining the classifier with the original dataset augmented by adversarial examples with groundtruth labels, known as adversarial training. In this paper we use the mixture of the clean and the adversarial samples for adversarial training. Table 2 shows the result of adversarial training for different values of . After adversarial training, the test error rates for adversarial test examples are reduced back to the level (12%) before attack. This is in stark contrast with the high misclassification of the undefended classifier in Table 1.
Defense\Attack  No attack  FGSM  

=0.1  =0.2  =0.3  =0.4  
Adv train  n/a  0.010  0.011  0.015  0.017 
This procedure of 1) adversarial sample generation using the current classifier, and 2) retraining classifier using the current adversarial examples, can be repeated for many rounds. Let’s denote the attack on the original classifier as FGSM1, and the corresponding retrained classifier as Adv FGSM1. Repeating the procedure above generates the sequence of attacks and defenses: FGSM1 Adv FGSM1 FGSM2 Adv FGSM2 FGSM3
Adv FGSM3, etc. The odd terms are attacks and the even terms are defenses (i.e., classifiers.)
We repeat this two step for 80 rounds. As a preview, Table 3 shows the test errors of the defenseattacks pairs where the defense is one of the {No defense, Adv FGSM1, Adv FGSM2, …} and the attacker is one of the {No attack, FGSM1, FGSM2, …}. Throughout the paper we use the following conventions for tables: the rows correspond to defense methods and the columns correspond to attack methods. All numbers are test errors. It is observed that the defense is effective against the immediatelypreceding defense (e.g., Adv FGSM1 defense has a low error against FGSM1 attack), and similarly the attack is effective against the immediatelypreceding defense (e.g., FGSM2 attack has a high error against Adv FGSM1). However, a defense/attack is not necessarily robust against other nonimmediately preceding attack/defense. From this we make two observations. First, the effectiveness of an attack/defense method depends critically on the defense/attack it is against, and therefore the performance of an attack/defense should be evaluated as the attackdefense pair and not in isolation. Second, it is not enough for a defender to choose the classifier parameters in response to a specific attack., i.e., adversarial training, but it should use a more principled method of selecting robust parameters. We address these below.
3.2 Gradientbased attacks and generalization
We first consider the interaction of the classifier and the gradientbased attack as a continuous twoplayer purestrategy zerosum game. To emphasize the parameters of the classifier/defender , let’s write the empirical risk of classifying the perturbed data as
(2) 
where denote a gradientbased attack based on the loss gradient
(3) 
and is the sequence of perturbed examples.
Given the classifier parameter , the gradientbased attack (Eq. 3) can be considered as a singlestep approximation to the general attack
(4) 
where can be any adversarial pattern subject to bounds such as and . Consequently, the goal of the defender is to choose the classifier parameter to minimize the maximum risk from such attacks [12, 17]
(5) 
with the same bound constraints. This general minimax optimization is difficult to solve directly due to the large search space of the inner maximization. In this respect, existing attack methods such as FGSM, IFGSM, or CarliniWagner [3]
can be consider as heuristics or approximations of the true maximization.
3.3 Minimax solutions
We describe an algorithm to find the solutions of Eq. 5 for gradientbased attacks (Eq. 3). In expectation of the attack, the defender should choose to minimize where the dependence of the attack on the classifier is expressed explicitly. If we minimize using gradient descent
(6) 
then from the chain rule, the total derivative
is(7) 
Interestingly, this total derivative (Eq. 7) at the current state coincides with the gradient of the following cost
(8) 
where . There are two implications. Interpretationwise, this cost function is the sum of the original risk and the ‘sensitivity’ term which penalizes abrupt changes of the risk w.r.t. the input. Therefore, is chosen at each iteration to not only decrease the risk but also to make the classifier insensitive to input perturbation so that the attacker cannot take advantage of large gradients. The idea of minimizing the sensitivity to input is a familiar approach in robustifying classifiers [9, 16]. Secondly, the new formulation can be implemented easily. The gradient descent update using the seemingly complicated gradient (Eq. 7) can be replaced by the gradient descent update of Eq. 8. The capability of automatic differentiation [27] in modern machine learning libraries can be used to compute the gradient of Eq. 8 efficiently.
We find the solution to the minimax problem by iterating the two steps. In the max step, we generate the current adversarial pattern , and in the min step, we update the classifier parameters by using Eq. 8. In practice, we require the adversarial patterns ’s have to be constrained by and , and therefore we can use the FGSM method (Eq. 1) to generate the patterns ’s in the max step. The resultant classifier parameters after convergence will be referred to as minimax defense against gradientbased attacks (MinimaxGrad). Note that this algorithm is similar but different from the algorithms of [12, 17]. Firstly, we use only one gradientdescent step to compute although we can use multiple steps as in [17]. More importantly, the sensitivity penalty in Eq. 8 plays an important role for convergences in finding a minimax solution. In contrast, simply repeating and without the penalty term does not guarantee convergence to minimax points unless minimax points are also saddle points (see [11] for the description of the difference.) This subtle difference will be observed in the experiments.
3.4 Experiments
We find the defense parameters using the algorithm above, which will be robust to gradientbased attacks. Fig. 1 shows the decrease of test error during training using this gradient descent approach for MNIST.
We reiterate the result of the catandmouse game in Sec. 3.1 and contrast it with the minimax solution (MinimaxGrad.) Table 3 shows that the adversarially trained classifier (Adv FGSM1) is robust to both clean data and FGSM1 attack, but is susceptible to FGSM2 attack, showing that the defense is only effective against immediatelypreceding attacks. The same holds for Adv FGSM2, Adv FGSM3, etc. After 80 rounds of the catandmouse procedure, the classifier Adv FGSM80 becomes robust to FGSM80 as well as moderately robust to other attacks including FGSM81 (=FGSMcurr). However, MinimaxGrad from the minimization of Eq. 8 is even more robust toward FGSMcurr than Adv FGSM80 and is overall the best. (See the last column – “worst” result.) To see the advantage of the sensitivity term in Eq. 8, we also performed the minimization of Eq. 8 without the sensitivity term under the same conditions as MinimaxGrad. This optimization method is similar to the method proposed in [12], which we will refer to as LWA (Learning with Adversaries). Note that LWA is a saddlepoint solution for gradientbased attacks since it solves and symmetrically. In the table, one can see that MinimaxGrad is also better than LWA overall, although the difference is not large. To improve the minimax defense even further, we can choose a larger attack class than single gradientstep attacks. Note that this will come at the cost of the increased difficulty in minimax optimization.
Defense\Attack  No attack  FGSM  FGSMcurr  worst  

FGSM1  FGSM2  FGSM80  
=0.1  No defense  0.026  0.446  0.073  0.054  0.446  0.446  
Adv FGSM1  0.008  0.010  0.404  0.037  0.435  0.435  
Adv FGSM2  0.011  0.311  0.009  0.038  0.442  0.442  
Adv FGSM80  0.007  0.028  0.018  0.010  0.117  0.117  
LWA  0.009  0.044  0.030  0.022  0.019  0.044  
MinimaxGrad  0.006  0.014  0.015  0.014  0.025  0.025  
=0.2  No defense  0.026  0.933  0.215  0.089  0.933  0.933  
Adv FGSM1  0.009  0.011  0.816  0.067  0.816  0.816  
Adv FGSM2  0.008  0.904  0.010  0.082  0.840  0.904  
Adv FGSM80  0.007  0.087  0.053  0.013  0.131  0.131  
LWA  0.007  0.157  0.034  0.036  0.026  0.157  
MinimaxGrad  0.008  0.082  0.085  0.049  0.027  0.085  
=0.3  No defense  0.026  0.983  0.566  0.087  0.983  0.983  
Adv FGSM1  0.010  0.015  0.892  0.080  0.892  0.892  
Adv FGSM2  0.010  0.841  0.017  0.058  0.764  0.841  
Adv FGSM80  0.007  0.352  0.117  0.021  0.043  0.352  
LWA  0.008  0.130  0.077  0.047  0.034  0.130  
MinimaxGrad  0.008  0.062  0.144  0.045  0.036  0.144  
=0.4  No defense  0.026  0.985  0.806  0.122  0.985  0.985  
Adv FGSM1  0.010  0.017  0.898  0.102  0.898  0.898  
Adv FGSM2  0.010  0.681  0.022  0.092  0.686  0.686  
Adv FGSM80  0.008  0.688  0.330  0.029  0.031  0.688  
LWA  0.009  0.355  0.171  0.086  0.042  0.355  
MinimaxGrad  0.009  0.081  0.221  0.076  0.026  0.221 
4 Minimax defense against networkbased attack
In this section, we consider another class of attacks – the neuralnetwork based attacks. We present an algorithm for finding minimax solutions for this attack class, and contrast the minimax solution with saddlepoint and maximin solutions.
4.1 Learningbased attacks
Again, let is a classifier parameterized by and is a loss function. The class of adversarial patterns for the general minimax problem (Eq. 5) is very large, which results in strong but nongeneralizable adversarial examples. Nongeneralizable means the perturbation has to be recomputed by solving the optimization for every new test sample . While such an ideal attack is powerful, its large size makes it difficult to analytically study the optimal defense methods. In Sec. 3, we restricted this class to gradientbased attacks. In this section, we restrict the class of patterns to that which can be generated by a flexible but manageable class of perturbation , e.g., a neural network of a fixed architecture where the parameter is the network weights. This class is a clearly a subset of general attacks, but is generalizable, i.e., no timeconsuming optimization is required in the test phase but only single feedforward passes though the network. The attack network (AttNet), as we will call it, can be of any class of appropriate neural networks. Here we use a threelayer fullyconnected ReLU network with 300 hidden units per layer in this paper. Different from [25] or [1], we feed the label into the input of the network along with the features . This is analogous to using the true label in the original FGSM. While this label input is not necessary but it can make the training of the attacker network easier. As with other attacks, we impose the norm constraint on , i.e., .
Suppose now is the empirical risk of a classifierattacker pair where the input is first transformed by attack network and then fed to the classifier . The attack network can be trained by gradient descent as well. Given the classifier parameter , we can use gradient descent
(9) 
to find an optimal attacker that maximizes the risk for the given fixed classifier . Table 4 compares the error rates of the FGSM attacks and the attack network (AttNet). The table shows that AttNet is better than or comparable to FGSM in all cases. In particular, we already observed that the FGSM attack is not effective against the classifier hardened against gradientbased attacks (Adv FGSM80 or MinimaxGrad), but the AttNet can incur significant error () for those hardened defenders for . This indicates that the class of learningbased attacks is indeed different from the class of gradientbased attacks.
Defense\Attack  FGSMcurr  AttNetcurr  worst  FGSMcurr  AttNetcurr  worst 

=0.1  =0.2  
No defense  0.446  0.697  0.697  0.933  0.999  0.999 
Adv FGSM1  0.435  0.909  0.909  0.816  0.897  0.897 
Adv FGSM80  0.117  0.786  0.768  0.131  1.000  1.000 
MinimaxGrad  0.025  0.498  0.498  0.085  0.956  0.956 
=0.3  =0.4  
No defense  0.983  1.000  1.000  0.985  1.000  1.000 
Adv FGSM1  0.892  1.000  1.000  0.898  1.000  1.000 
Adv FGSM80  0.352  0.887  0.887  0.688  1.000  1.000 
MinimaxGrad  0.144  1.000  1.000  0.221  1.000  1.000 
4.2 Minimax solution
We consider the twoplayer zerosum game between a classifier and an networkbased attacker where each player can choose its own parameters. Given the current classifier , an optimal whitebox attacker parameter is the maximizer of the risk
(10) 
Consequently, the defender should choose the classifier parameters such that the maximum risk is minimized
(11) 
As before, this solution to the continuous minimax problem has a natural interpretation as the best worstcase solution. Assuming the attacker is optimal, i.e., it chooses the best attack from Eq. 10 given , no other defense can achieve a lower risk than the minimax defense in Eq. 11. The minimax defense is also a conservative defense. If the attacker is not optimal, and/or if the attack does not know the defense exactly (as in blackbox attacks), the actual risk can be lower than what the minimax solution predicts. Before proceeding further, we point out that the claims above apply to the global minimizer and the maximizer function , but in practice we can only find local solutions for complex risk functions of deep classifiers and attackers.
To solve Eq. 11, we analyze the problem similarly to Eqs. 68 from the previous section. At each iteration, the defender should choose in expectation of the attack and minimize . We use gradient descent
(12) 
where the total derivative is
(13) 
Since the exact maximizer is difficult to find, we only update incrementally by one (or more) steps of gradientascent update
(14) 
The resulting formulation is closely related to the unrolled optimization [20] proposed for training GANs, although the latter has a very different cost function . Using the single update (Eq. 14), the total derivative is
(15) 
Similar to hardening a classifier against gradientbased attacks by minimizing Eq. 8 at each iteration, the gradient update of for can be done using the gradient of the following sensitivitypenalized function
(16) 
In other words, is chosen not only to minimize the risk but also to prevent the attacker from exploiting the sensitivity of to . The algorithm is summarized in Alg. 1.
The classifier obtained after convergence will be referred to as MinimaxAttNet. Note that Alg. 1 is independent of the adversarial example problems presented in the paper, and can be used for other minimax problems as well.
4.3 Minimax vs maximin solutions
In analogy with the minimax problem, we can also consider the maximin solution defined by
(17) 
where
(18) 
is the minimizer function. Here we are abusing the notations for the minimax solution , the maximin solution , the minimizer , and the maximizer . Similar to the minimax solution, the maximin solution has an intuitive meaning – it is the best worstcase solution for the attacker. Assuming the defender is optimal, i.e., it chooses the best defense from Eq. 18 that minimizes the risk given the attack , no other attack can inflict a higher risk than the maximin attack . It is also a conservative attack. If the defender is not optimal, and/or if the defender does not know the attack exactly, the actual risk can be higher than what the solution predicts. Note that the maximin scenario where the defender knows the attack method is not very realistic but is the opposite of the minimax scenario and provides the lower bound.
To summarize, minimax and maximin defenses and attacks have the following inherent properties.
Lemma 1
Let be the solutions of Eqs. 11,10,17,18.

: For any given defense , the max attack is the most effective attack.

: Against the optimal attack , the minimax defense is the most effective defense.

: For any given attack , the min defense is the most effective defense.

: Against the optimal defense , the maximin attack is the most effective attack.

: The risk of the best worstcase attack is lower than that of the best worstcase defense.
These properties follow directly from the definitions. The lemma helps us to better understand the dependence of defense and attack, and gives us the range of the possible risk values which can be measured empirically. To find maximin solutions, we use the same algorithm (Alg. 1) except that the variables and are switched and the sign of is flipped before the algorithm is called. The resultant classifier will be referred to as MaximinAttNet.
4.4 Experiments
In addition to minimax and maximin optimization, we also consider as a reference algorithm the alternating descent/ascent method used in GAN training [7]
(19) 
and refer to its solution as AltAttNet. Similar to our discussion on MinimaxGrad and LWA in Sec. 3.4, the alternating descent/ascent finds local saddle points which are not necessarily minimax or maximin solutions, and therefore its solution will in general be different from the solution from Alg. 1. The difference of the solutions from three optimizations – MinimaxAttNet, MaximinAttNet, and AltAttNet – applied to a common problem, is demonstrated in Fig. 2. The figure shows the test error over the course of optimization starting from random initializations. One can see that MinimaxAttNet (top blue curves) and AltAttNet (middle green curves) converge to different values suggesting the learned classifiers will also be different.
Table 5 compares the robustness of the classifiers – MinimaxAttNet, AltAttNet and MinimaxGrad (from Sec. 3), against the AttNet attack. and the FGSM attack. Not surprisingly, both MinimaxAttNet and AltAttNet are much more robust than MinimaxGrad against AttNet. MinimaxAttNet performs similarly to AltAttNet at =0.1 and 0.2, but is much better at =0.3 and 0.4. The different performance of MinimaxAttNet vs AltAttNet implies that the minimax solution found by Alg. 1 is different from the solution found by alternating descent/ascent. In addition, against FGSM attacks, MinimaxAttNet is quite robust (0.058  0.116) despite that the classifiers are not trained against gradientbased attacks at all. In contrast, MinimaxGrad is very vulnerable (0.498 – 1.000) against AttNet which we have already observed. This result suggests that the class of AttNet attacks and the class of gradientbased attacks are indeed different, and the former class partially subsumes the latter.
Defense\Attack  FGSMcurr  AttNetcurr  worst  FGSMcurr  AttNetcurr  worst 

=0.1  =0.2  
MinimaxAttNet  0.058  0.010  0.058  0.109  0.010  0.109 
AltAttNet  0.048  0.010  0.048  0.096  0.016  0.096 
MinimaxGrad  0.025  0.498  0.498  0.085  0.956  0.956 
=0.3  =0.4  
MinimaxAttNet  0.116  0.018  0.116  0.079  0.364  0.364 
AltAttNet  0.158  0.032  0.158  0.334  0.897  0.897 
MinimaxGrad  0.144  1.000  1.000  0.221  1.000  1.000 
Lastly, adversarial examples generated by various attacks in the paper have diverse patterns and are shown in Fig. 3 of the appendix. All the experiments with the MNIST dataset presented so far are also performed with the CIFAR10 dataset and are reported in the appendix. To summarize, the results with CIFAR10 are similar: MinimaxGrad outperforms nonminimax defenses, and AttNet can attack classifiers which are hardened against gradientbased attacks. However, gradientbased attacks are also very effective against classifiers hardened against AttNet, and also MinimaxAttNet and AltAttNet perform similarly, which are not the case with the MNIST dataset. The issue of defending against outofclass attacks is discussed in the next section.
5 Discussion
5.1 Robustness against multiple attack types
We discuss some limitations of the current study and propose an extension. Ideally, a defender should find a robust classifier against the worst attack from a large class of attacks such as the general minimax problem (Eq. 5). However, it is difficult to train classifiers with a large class of attacks, due to the difficulty of modeling the class and of optimization itself. On the other hand, if the class is too small, then the worst attack from that class is not representative of all possible worst attacks, and therefore the minimax defense found will not be robust to outofclass attacks. The tradeoff seems inevitable.
It is, however, possible to build a defense against multiple specific types of attacks. Suppose are different types of attacks, e.g., =FGSM, =IFGSM, etc. The minimax defense for the combined attack is the solution to the mixed continuousdiscrete problem
(20) 
Additionally, suppose are different types of learningbased attacks, e.g., =2layer dense net, =5layer convolutional nets, etc. The minimax defense against the mixture of multiple fixedtype and learningbased attacks can be found by solving
(21) 
that is, minimize the risk against the strongest attacker across multiple attack classes. Note the strongest attacker class and its parameters change as the classifier changes. Due to the computational demand to solve Eq. 21, we leave it as a future work to compute minimax solutions against multiple classes of attacks.
5.2 Adversarial examples and privacy attacks
Lastly, we discuss a bigger picture of the game between adversarial players. The minimax optimization arises in the leaderfollower game [2] with the constant sum constraint. The leaderfollower setting makes sense because the defense (=classifier parameters) is often public knowledge and the attacker exploits the knowledge. Interestingly, the problem of the attack on privacy [10] has a very similar formulation as the adversarial attack problem, different only in that the classifier is an attacker and the data perturbator is a defender. In the problem of privacy preservation against inference, the defender is a data transformer (parameterized by ) which perturbs the raw data, and the attacker is a classifier (parameterized by ) who tries to extract sensitive information such as identity from the perturbed data such as online activity of a person. The transformer is the leader, such as when the privacy mechanism is public knowledge, and the classifier is the follower as it attacks the given perturbed data. The risk for the defender is therefore the accuracy of the inference of sensitive information measured by . Solving the minimax risk problem () gives us the best worstcase defense when the classifier/attacker knows the transformer/defender parameters, which therefore gives us a robust data transformer to preserve the privacy against the best inference attack (among the given class of attacks.) On the other hand, solving the maximin risk problem () gives us the best worstcase classifier/attacker when its parameters are known to the transformer. As one can see, the problems of adversarial attack and privacy attack are two sides of the same coin, which can potentially be addressed by similar frameworks and optimization algorithms.
6 Conclusion
In this paper, we explain and formulate the adversarial sample problem in the context of twoplayer continuous game. We analytically and numerically study the problem with two types of attack classes – gradientbased and networkbased – and show different properties of the solutions from those two classes. While a classifier robust to all types of attack may yet be an elusive goal, we claim that the minimax defense is a very reasonable goal, and that such a defense can be computed for classes such as gradient or networkbased attacks. We present optimization algorithms for numerically finding those defenses. The results with the MNIST and the CIFAR10 dataset show that the classifier found by the proposed method outperforms nonminimax optimal classifiers, and that the networkbased attack is a strong class of attacks that should be considered in adversarial example research in addition to the gradientbased attacks which are used more frequently. As future work, we plan to study further on the issue of transferability of a defense method to outofclass attacks, and on efficient minimax optimization algorithms for finding defense against general attacks.
References
 Baluja and Fischer [2017] Baluja, S., Fischer, I.: Adversarial transformation networks: Learning to generate adversarial examples. arXiv preprint arXiv:1703.09387 (2017)
 Brückner and Scheffer [2011] Brückner, M., Scheffer, T.: Stackelberg games for adversarial prediction problems. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. pp. 547–555. ACM (2011)
 Carlini and Wagner [2017] Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: Security and Privacy (SP), 2017 IEEE Symposium on. pp. 39–57. IEEE (2017)
 Dalvi et al. [2004] Dalvi, N., Domingos, P., Sanghai, S., Verma, D., et al.: Adversarial classification. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining. pp. 99–108. ACM (2004)
 Fawzi et al. [2015] Fawzi, A., Fawzi, O., Frossard, P.: Analysis of classifiers’ robustness to adversarial perturbations. arXiv preprint arXiv:1502.02590 (2015)
 Globerson and Roweis [2006] Globerson, A., Roweis, S.: Nightmare at test time: robust learning by feature deletion. In: Proceedings of the 23rd international conference on Machine learning. pp. 353–360. ACM (2006)
 Goodfellow et al. [2014a] Goodfellow, I., PougetAbadie, J., Mirza, M., Xu, B., WardeFarley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in Neural Information Processing Systems. pp. 2672–2680 (2014a)
 Goodfellow et al. [2014b] Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014b)
 Gu and Rigazio [2014] Gu, S., Rigazio, L.: Towards deep neural network architectures robust to adversarial examples. arXiv preprint arXiv:1412.5068 (2014)
 Hamm [2017] Hamm, J.: Minimax filter: learning to preserve privacy from inference attacks. The Journal of Machine Learning Research 18(1), 4704–4734 (2017)
 Hamm and Noh [2018] Hamm, J., Noh, Y.K.: Kbeam minimax: Efficient optimization for deep adversarial learning. Accepted to International Conference on Machine Learning (ICML18) (2018)
 Huang et al. [2015] Huang, R., Xu, B., Schuurmans, D., Szepesvári, C.: Learning with a strong adversary. arXiv preprint arXiv:1511.03034 (2015)
 Kurakin et al. [2016] Kurakin, A., Goodfellow, I., Bengio, S.: Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533 (2016)
 Lanckriet et al. [2002] Lanckriet, G.R., Ghaoui, L.E., Bhattacharyya, C., Jordan, M.I.: A robust minimax approach to classification. Journal of Machine Learning Research 3(Dec), 555–582 (2002)
 Lu et al. [2017] Lu, J., Issaranon, T., Forsyth, D.: Safetynet: Detecting and rejecting adversarial examples robustly. arXiv preprint arXiv:1704.00103 (2017)
 Lyu et al. [2015] Lyu, C., Huang, K., Liang, H.N.: A unified gradient regularization family for adversarial examples. In: Data Mining (ICDM), 2015 IEEE International Conference on. pp. 301–309. IEEE (2015)
 Madry et al. [2017] Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083 (2017)
 Meng and Chen [2017] Meng, D., Chen, H.: Magnet: a twopronged defense against adversarial examples. arXiv preprint arXiv:1705.09064 (2017)
 Mescheder et al. [2017] Mescheder, L., Nowozin, S., Geiger, A.: The numerics of gans. In: Advances in Neural Information Processing Systems. pp. 1823–1833 (2017)
 Metz et al. [2016] Metz, L., Poole, B., Pfau, D., SohlDickstein, J.: Unrolled generative adversarial networks. arXiv preprint arXiv:1611.02163 (2016)
 Metzen et al. [2017] Metzen, J.H., Genewein, T., Fischer, V., Bischoff, B.: On detecting adversarial perturbations. arXiv preprint arXiv:1702.04267 (2017)
 MoosaviDezfooli et al. [2016] MoosaviDezfooli, S.M., Fawzi, A., Fawzi, O., Frossard, P.: Universal adversarial perturbations. arXiv preprint arXiv:1610.08401 (2016)
 MoosaviDezfooli et al. [2017] MoosaviDezfooli, S.M., Fawzi, A., Fawzi, O., Frossard, P., Soatto, S.: Analysis of universal adversarial perturbations. arXiv preprint arXiv:1705.09554 (2017)
 Nagarajan and Kolter [2017] Nagarajan, V., Kolter, J.Z.: Gradient descent gan optimization is locally stable. In: Advances in Neural Information Processing Systems. pp. 5591–5600 (2017)
 Nguyen and Sinha [2017] Nguyen, L., Sinha, A.: A learning approach to secure learning. arXiv preprint arXiv:1709.04447 (2017)
 Papernot et al. [2016] Papernot, N., McDaniel, P., Wu, X., Jha, S., Swami, A.: Distillation as a defense to adversarial perturbations against deep neural networks. In: Security and Privacy (SP), 2016 IEEE Symposium on. pp. 582–597. IEEE (2016)
 Rall [1981] Rall, L.B.: Automatic differentiation: Techniques and applications (1981)
 Roth et al. [2017] Roth, K., Lucchi, A., Nowozin, S., Hofmann, T.: Stabilizing training of generative adversarial networks through regularization. In: Advances in Neural Information Processing Systems. pp. 2015–2025 (2017)
 Szegedy et al. [2013] Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., Fergus, R.: Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013)
 Tramèr et al. [2017] Tramèr, F., Kurakin, A., Papernot, N., Boneh, D., McDaniel, P.: Ensemble adversarial training: Attacks and defenses. arXiv preprint arXiv:1705.07204 (2017)
Appendix 0.A Results with MNIST
The architecture of the MNIST classifier is similar to the Tensorflow model
^{2}^{2}2https://github.com/tensorflow/models/tree/master/tutorials/image/mnist, and is trained with the following hyperparameters:
Batch size = 128, optimizer = AdamOptimizer with , total # of iterations=50,000.
The attack network has three hidden fullyconnected layers of 300 units,
trained with the following hyperparameters:
Batch size = 128, dropout rate = 0.5, optimizer = AdamOptimizer with , total # of iterations=30,000.
For minimax, saddlepoint, and maximin optimization, the total number of iteration was 100,000. The sensitivitypenalty coefficient of was used in Alg. 1.
Appendix 0.B Results with CIFAR10
We preprocess the CIFAR10 dataset by removing the mean and normalizing the pixel values with the standard deviation of all pixels in the image. It is followed by clipping the values to
standard deviations and rescaling to . The architecture of the CIFAR classifier is similar to the Tensorflow model^{3}^{3}3 https://github.com/tensorflow/models/tree/master/tutorials/image/cifar10 but is simplified further by removing the local response normalization layers. With the simple structure, we attained accuracy with the test data. The classifier is trained with the following hyperparameters:Batch size = 128, optimizer = AdamOptimizer with , total # of iterations=100,000.
The attack network has three hidden fullyconnected layers of 300 units,
trained with the following hyperparameters:
Batch size = 128, dropout rate = 0.5, optimizer = AdamOptimizer with , total # of iterations=30,000.
For minimax, saddlepoint, and maximin optimization, the total number of iteration was 100,000. The sensitivitypenalty coefficient of was used in Alg. 1.
In the rest of the appendix, we repeat all the experiments with the MNIST dataset using the CIFAR10 dataset.
Defense\Attack  No attack  FGSM  

=0.05  =0.06  =0.07  =0.08  
No defense  0.222  0.766  0.790  0.807  0.823 
Defense\Attack  No attack  FGSM  

=0.05  =0.06  =0.07  =0.08  
Adv train  n/a  0.425  0.452  0.466  0.470 
Defense\Attack  No attack  FGSM  FGSMcurr  

FGSM1  FGSM2  FGSM40  
=0.05  No defense  0.222  0.766  0.734  0.655  0.766  
Adv FGSM1  0.215  0.425  0.533  0.420  0.533  
Adv FGSM2  0.206  0.422  0.456  0.406  0.501  
Adv FGSM40  0.210  0.370  0.412  0.348  0.588  
LWA  0.203  0.422  0.464  0.423  0.456  
MinimaxGrad  0.203  0.425  0.475  0.423  0.481  
=0.06  No defense  0.222  0.790  0.761  0.680  0.790  
Adv FGSM1  0.215  0.452  0.565  0.440  0.565  
Adv FGSM2  0.208  0.447  0.482  0.431  0.517  
Adv FGSM40  0.216  0.398  0.431  0.353  0.599  
LWA  0.208  0.446  0.493  0.447  0.489  
MinimaxGrad  0.199  0.431  0.473  0.446  0.453  
=0.07  No defense  0.222  0.807  0.787  0.704  0.807  
Adv FGSM1  0.214  0.466  0.555  0.450  0.555  
Adv FGSM2  0.206  0.456  0.490  0.445  0.501  
Adv FGSM40  0.218  0.397  0.416  0.346  0.423  
LWA  0.208  0.453  0.499  0.451  0.485  
MinimaxGrad  0.208  0.461  0.497  0.456  0.487  
=0.08  No defense  0.222  0.823  0.807  0.709  0.823  
Adv FGSM1  0.213  0.470  0.533  0.462  0.533  
Adv FGSM2  0.204  0.459  0.466  0.462  0.476  
Adv FGSM40  0.226  0.422  0.421  0.331  0.338  
LWA  0.208  0.470  0.485  0.469  0.485  
MinimaxGrad  0.203  0.456  0.464  0.459  0.462 
Defense\Attack  FGSMcurr  AttNetcurr  worst  FGSMcurr  AttNetcurr  worst 

=0.05  =0.06  
no defense  0.766  0.504  0.766  0.583  0.766  0.790 
Adv FGSM1  0.533  0.356  0.533  0.565  0.473  0.565 
Adv FGSM40  0.588  0.454  0.588  0.599  0.442  0.599 
MinimaxGrad  0.481  0.343  0.481  0.453  0.484  0.484 
=0.07  =0.08  
no defense  0.807  0.655  0.807  0.823  0.685  0.823 
Adv FGSM1  0.555  0.499  0.555  0.535  0.678  0.678 
Adv FGSM40  0.423  0.669  0.669  0.338  0.797  0.797 
MinimaxGrad  0.487  0.529  0.529  0.462  0.607  0.607 
Defense\Attack  FGSMcurr  AttNetcurr  worst  FGSMcurr  AttNetcurr  worst 

=0.05  =0.06  
MinimaxAttNet  0.731  0.239  0.731  0.733  0.248  0.733 
AltAttNet  0.721  0.238  0.721  0.743  0.255  0.743 
MinimaxGrad  0.481  0.343  0.481  0.453  0.484  0.484 
=0.07  =0.08  
MinimaxAttNet  0.762  0.256  0.762  0.775  0.266  0.775 
AltAttNet  0.743  0.257  0.732  0.771  0.258  0.771 
MinimaxGrad  0.487  0.529  0.529  0.462  0.607  0.607 
Comments
There are no comments yet.