1. Introduction
Deep learning has been demonstrating exceptional performance on several categories of machine learning problems and has been applied in many settings [8; 19; 32; 14; 15; 28; 22]. However, people recently find that deep neural networks (DNNs) could be vulnerable to adversarial attacks [5; 23; 20], which arouses concerns of applying deep learning in securitycritical tasks. Adversarial attacks are implemented through generating adversarial examples, which are crafted by adding delicate distortions onto legal inputs. Fig. 1 shows adversarial examples for targeted adversarial attacks that can fool DNNs.
The security properties of deep learning have been investigated from two aspects: (i) enhancing the robustness of DNNs under adversarial attacks and (ii) crafting adversarial examples to test the vulnerability of DNNs. For the former aspect, research works have been conducted by either filtering out added distortions [13; 3; 10; 35] or revising DNN models [26; 9; 11]
to defend against adversarial attacks. For the later aspect, adversarial examples have been generated heuristically
[12; 29], iteratively [25; 20; 16; 34], or by solving optimization problems [31; 6; 7; 2]. These two aspects mutually benefit each other towards hardening DNNs under adversarial attacks. And our work deals with the problem from the later aspect.For targeted adversarial attacks, the crafted adversarial examples should be able to mislead the DNN to classify them as any target labels, as done in Fig. 1. Also, in a successful adversarial attack, the targeted misclassification should be achieved with the minimal distortion added to the original legal input. Here comes the question of how to measure the added distortions. Currently, in the literature, , , , and norms are used to measure the added distortions, and they are respectively named , , , and adversarial attacks. Even though no measure can be perfect for human perceptual similarity, these measures or attack types may be employed for different application specifications. This work bridges the literature gap by unifying all the types of attacks with a single intact framework.
In order to benchmark DNN defense techniques and to push for a limit of the DNN security level, we should develop the strongest adversarial attacks. For this purpose, we adopt the whitebox attack assumption in that the attackers have complete information about the DNN architectures and all the parameters. This is also a realistic assumption, because even if we only have blackbox access to the DNN model, we can train a substitute model and transfer the attacks generated using the substitute model. And for the same purpose, we adopt the optimizationbased approach to generate adversarial examples. The objectives of the optimization problem should be (i) misleading the DNN to classify the adversarial example as a target label and (ii) minimizing the norm of the added distortion.
By leveraging ADMM (Alternating Direction Method of Multipliers) [4], an operator splitting optimization approach, we provide a universal framework for , , , and
adversarial attacks. ADMM decomposes an original optimization problem into two correlated subproblems, each of which can be solved more efficiently or analytically, and then coordinates solutions to the subproblems to construct a solution to the original problem. This decompositionalternating procedure of ADMM blends the benefits of dual decomposition and augmented Lagrangian for solving problems with nonconvex and combinatorial constraints. Therefore, ADMM introduces no additional suboptimality besides the original gradientbased backpropagation method commonly used in DNNs and provides a faster linear convergence rate than stateoftheart iterative attacks
[25; 20; 16; 34]. We also compare with the optimizationbased approaches, i.e., Carlini & Wagner (C&W) attack [6] and Elasticnet (EAD) attack [7], which are the currently strongest attacks in the literature.The major contributions of this work and its differences from C&W and EAD attacks are summarized as follows:

With our ADMMbased universal framework, all the , , , and adversarial attacks can be implemented with little modifications, while C&W only performs , , and attacks and EAD only performs and attacks.

C&W attack needs to run their attack iteratively to find the pixels with the least effect and fix them, thereby identifying a minimal subset of pixels for modification to generate an adversarial example.

C&W attack through naively optimization with gradient descent may produce very poor initial results. They solve the issue by introducing a limit on the norm and reducing the limit iteratively.

EAD attack minimizes a weighted sum of and norms. However, a universal attack generation model is missing.

Our extensive experiments show that we are so far the best attacks. Besides the 100% attack success rate, our ADMMbased attacks outperform C&W and EAD in each type of attacks in terms of minimal distortion.
Besides comparing with C&W, EAD and other attacks, we also test our attacks against defenses such as defensive distillation
[26] and adversarial training [33], demonstrating the success of our attacks. In addition, we validate the transferability of our attacks onto different DNN models. The codes of our attacks to reproduce the results are available online^{1}^{1}1Codes will be available upon publication of this work..2. Related Work
We introduce the most representative attacks and defenses in this section.
2.1. Adversarial Attacks
LBFGS Attack [31] is the first optimizationbased attack and is an attack that uses norm to measure the distortion in the optimization objective function.
JSMA Attack [25] is an
attack and uses a greedy algorithm that picks the most influential pixels by calculating Jacobianbased Saliency Map and modifies the pixels iteratively. The computational complexity is prohibitive even for applying to ImageNet dataset.
FGSM [12] and IFGSM [20] Attacks are
attacks and utilize the gradient of the loss function to determine the direction to modify the pixels. They are designed to be fast, rather than optimal. They can be used for adversarial training by directly changing the loss function instead of explicitly injecting adversarial examples into the training data. The fast gradient method (FGM) and the iterative fast gradient method (IFGM) are improvements of FGSM and IFGSM, respectively, that can be fitted as
, , and attacks.C&W Attacks [6] are a series of , , and attacks that achieve 100% attack success rate with much lower distortions comparing with the abovementioned attacks. In particular, the C&W attack is superior to LBFGS attack (which is also an attack) because it uses a better objective function.
EAD Attack [7] formulates the process of crafting adversarial examples as an elasticnet regularized optimization problem. Elasticnet regularization is a linear mixture of and norms used in the penalty function. EAD attack is able to craft oriented adversarial examples and includes the C&W attack as a special case.
2.2. Representative Defenses
Defensive Distillation [26] introduces temperature
into the softmax layer and uses a higher temperature for training and a lower temperature for testing. The training phase first trains a teacher model that can produce soft labels for the training dataset and then trains a distilled model using the training dataset with soft labels. The distilled model with reduced temperature will be preserved for testing.
Adversarial Training [33] injects adversarial examples with correct labels into the training dataset and then retrains the neural network, thus increasing robustness of DNNs under adversarial attacks.
3. An ADMMBased Universal Framwork for Adversarial Attacks
ADMM was first introduced in the mid1970s with roots in the 1950s, and the algorithm and theory have been established by the mid1990s. It was proposed and made popular recently by S. Boyd et al. for statistics and machine learning problems with a very large number of features or training examples [4]. ADMM method takes the form of a decompositionalternating procedure, in which the solutions to small local subproblems are coordinated to find a solution to a large global problem. It can be viewed as an attempt to blend the benefits of dual decomposition and augmented Lagrangian methods for constrained optimization.
ADMM was developed in part to bring robustness to the dual ascent method, and in particular, to yield convergence without assumptions like strict convexity or finiteness of the objective. ADMM is also capable of dealing with combinatorial constraints due to its decomposition property. It can be used in many practical applications, where the convexity of the objective can not be guaranteed or it has some combinatorial constraints. Besides, it converges fast in many cases since the two arguments are updated in an alternating or sequential fashion, which accounts for the term alternating direction.
3.1. Notations and Definitions
In this paper, we mainly evaluate the adversarial attacks with image classification tasks. A two dimensional vector
represents a grayscale image with height and width. For a colored RGB image with three channels, a three dimensional tensor
is utilized to denote it. Each element represents the value of the th pixel and is scaled to the range of . A neural network has the model , where generates an output given an input . Model is fixed since we perform attacks on given neural network models.The output layer performs softmax operation and the neural network is an
class classifier. Let the logits
denote the input to the softmax layer, which represents the output of all layers except for the softmax layer. We have . The element of the output vectorrepresents the probability that input
belongs to the th class. The output vectoris treated as a probability distribution, and its elements satisfy
and . The neural network classifies input according to the maximum probability, i.e., .The adversarial attack can be either targeted or untargeted. Given an original legal input with its correct label , the untargeted adversarial attack is to find an input satisfying while and are close according to some measure of the distortion. The untargeted adversarial attack does not specify any target label to mislead the classifier. In the targeted adversarial attack, with a given target label , an adversarial example is an input such that while and are close according to some measure of the distortion. In this work, we consider targeted adversarial attacks since they are believed stronger than untargeted attacks.
3.2. General ADMM Framework for Adversarial Attacks
The initial problem of constructing adversarial examples is defined as: Given an original legal input image and a target label , find an adversarial example , such that is minimized, , and . is the distortion added onto the input . is the classification function of the neural network and the adversarial example is classified as the target label .
is a measure of the distortion . We need to measure the distortion between the original legal input and the adversarial example . norms are the most commonly used measures in the literature. The norm of the distortion between and is defined as:
(1) 
We see the use of , , , and norms in different attacks.

norm: measures the number of mismatched elements between and .

norm: measures the sum of the absolute values of the differences between and .

norm: measures the standard Euclidean distance between and .

norm: measures the maximum difference between and for all ’s.
In this work, with a general ADMMbased framework, we implement , , , and attacks, respectively. When generating adversarial examples in the four attacks, in the objective function becomes , , , and norms, respectively. For the simplicity of expression, in the general ADMMbased framework, the form of is used to denote the measure of . When introducing the detailed four attacks based on the ADMM framework, we utilize the form of norm to represent the distortion measure.
ADMM provides a systematic way to deal with nonconvex and combinatorial constraints by breaking the initial problem into two subproblems. To do this, the initial problem is first transformed into the following problem, introducing an auxiliary variable :
(2) 
where has the form:
(3) 
Here is the logits before the softmax layer. means the th element of . The function ensures that the input is classified with target label . The augmented Lagrangian function of problem (2) is as follows:
(4) 
where is the dual variable or Lagrange multiplier and is called the penalty parameter. Using the scaled form of ADMM by defining , we have:
(5) 
ADMM solves problem (2) through iterations. In the th iteration, the following steps are performed:
(6) 
(7) 
(8) 
In Eqn. (6), we find which minimizes with fixed and . Similarly, in Eqn. (7), and are fixed and we find minimizing . is then updated accordingly. Note that the two variables and are updated in an alternating or sequential fashion, from which the term alternating direction comes. It converges when:
(9) 
Equivalently, in each iteration, we solve two optimization subproblems corresponding to Eqns. (6) and (7), respectively:
(10) 
and
(11) 
The nondifferentiable makes it difficult to solve the second subproblem (11). Therefore, a new differentiable inspired by [6] is utilized as follows:
(12) 
Then, stochastic gradient decent methods can be used to solve this subproblem. The Adam optimizer [17] is applied due to its fast and robust convergence behavior. In the new of Eqn. (12), is a confidence parameter denoting the strength of adversarial example transferability. The larger , the stronger transferability of the adversarial example. It can be kept as 0 if we do not evaluate the transferability.
3.3. Box Constraint
The constraint on i.e., is known as a “box constraint” in the optimization literature. We use a new variable and instead of optimizing over defined above, we optimize over , based on:
(13) 
Here the is performed elementwise. Since , the method will automatically satisfy the box constraint and allows us to use optimization algorithms that do not natively support box constraints.
3.4. Selection of Target Label
For targeted attacks, there are different ways to choose the target labels:

Average Case: select at random the target label uniformly among all the labels that are not the correct label.

Best Case: perform attacks using all incorrect labels, and report the target label that is the least difficult to attack.

Worst Case: perform attacks using all incorrect labels, and report the target label that is the most difficult to attack.
We evaluate the performs of the proposed ADMM attacks in the three cases mentioned above.
3.5. Discussion on Constants
There are two constants and in the two subproblems (10) and (11). Different policies are adopted for choosing appropriate and in , , and attacks. In attack, since acts in both problems (10) and (11), we fix and change to improve the solutions. We find that the best choice of is the smallest one that can help achieve in the subproblem (11). Thus, a modified binary search is used to find a satisfying . For the ADMM attack, as has stronger and more direct influence on the solutions, is fixed and adaptive search of is utilized. More details are provided in Section 4.2. For the ADMM and attacks, as we find fixed and can achieve good performance, and are kept unchanged and adaptive search method is not used.
4. Instantiations of , , and Attacks based on ADMM Framework
The ADMM framework for adversarial attacks now need to solve two subproblems (10) and (11). The difference between , , and attacks lies in the subproblem (10), while the processes to find the solutions of the subproblem (11
) based on stochastic gradient descent method are the very similar for the four attacks.
4.1. Attack
For attack, the subproblem (10) has the form:
(14) 
the solution to which can be directly derived in an analytical format:
(15) 
Then the complete solution to the attack problem using the ADMM framework is as follows: for the th iteration,
(16) 
(17) 
(18) 
Eqn. (16) corresponds to the analytical solution to the subproblem (10) i.e., problem (14) with Eqn. (13) replacing in Eqn. (15). Eqn. (17) corresponds to the subproblem (11) with Eqn. (13) replacing and taking the form of Eqn. (12). The solution to Eqn. (17) is derived through the Adam optimizer with stochastic gradient descent.
4.2. Attack
For attack, the subproblem (10) has the form:
(19) 
Its equivalent optimization problem is as follows:
(20) 
The solution to problem (19) can be obtained through where is the solution to problem (20). The solution to problem (20) can be derived in this way: let be equal to first, then for each element in , if its square is smaller than , make it zero. A proof for the solution is given in the following.
Lemma 4.1 ().
Suppose that two matrices , are of the same size, and that there are at least zero elements in . Then the optimal value of the following problem is the sum of the square of the smallest elements in .
(21) 
The proof for the lemma is straightforward and we omit it for the sake of brevity. We use to denote the sum of the smallest ( is an element in ).
Theorem 4.2 ().
Set and then make those elements in zeros if their square are smaller than . Such would yield the minimum objective value of problem (20).
Proof.
Suppose that is constructed according to the above rule in Theorem 1, and has elements equal to 0. We need to prove that is the optimal solution with the minimum objective value. Suppose we have another arbitrary solution with elements equal to 0. Both and have elements. The objective value of solution is:
(22) 
The objective value of solution is:
(23) 
If , then according to the definition of , we have
(24) 
So that
(25) 
If , then according to the definition of , we have
(26) 
So that
(27) 
Thus, we can see that our solution can achieve the minimum objective value and it is the optimal solution. ∎
When solving the subproblem (19) according to Theorem 4.2, we enforce a hidden constraint on the distortion , that the square of each nonzero element in must be larger than . Therefore, a smaller would push ADMM method to find with larger nonzero elements, thus reducing the number of nonzero elements and decreasing norm. Empirically, we find the constant represents a tradeoff between attack success rate and norm of the distortion, i.e., a larger can help find solutions with higher attack success rate at the cost of larger norm of the distortion.
4.3. Attack
For attack, the subproblem (10) has the form:
(28) 
Problem (28) has the closedform solution. If we change the variable , then problem becomes
(29) 
The solution of problem (29) is given by the soft thresholding operator evaluated at the point with a parameter [27],
(30) 
where is taken in elementwise, and if , and otherwise. Therefore, the solution to problem (28) is given by
(31) 
4.4. Attack
For attack, the subproblem (10) has the form:
(32) 
This problem does not have a closed form solution. One possible method is to derive the KKT conditions of problem (32) [27]. Here we use stochastic gradient decent methods to solve it. In the experiments, we find that the Adam optimizer [17] could achieve fast and robust convergence results. So Adam optimizer is utilized to solve Eqn. (32). Since Eqn. (32) is relatively simpler compared with Eqn. (17), the complexity for solving Eqn. (32) with Adam optimizer is negligible.
5. Performance Evaluation
Data Set  Attack Method  Best Case  Average Case  Worst Case  
ASR  ASR  ASR  
MNIST  FGM()  99.4  2.245  25.84  0.574  34.6  3.284  39.15  0.747  0  N.A.  N.A.  N.A. 
IFGM()  100  1.58  18.51  0.388  99.9  2.50  32.63  0.562  99.6  3.958  55.04  0.783  
C&W()  100  1.393  13.57  0.402  100  2.002  22.31  0.54  99.9  2.598  31.43  0.689  
ADMM()  100  1.288  13.87  0.345  100  1.873  22.52  0.498  100  2.445  31.427  0.669  
CIFAR10  FGM()  99.5  0.421  14.13  0.05  42.8  1.157  39.5  0.136  0.7  3.115  107.1  0.369 
IFGM()  100  0.191  6.549  0.022  100  0.432  15.13  0.047  100  0.716  25.22  0.079  
C&W()  100  0.178  6.03  0.019  100  0.347  12.115  0.0364  99.9  0.481  16.75  0.0536  
ADMM()  100  0.173  5.8  0.0192  100  0.337  11.65  0.0365  100  0.476  16.73  0.0535  
ImageNet  FGM()  12  2.29  752.9  0.087  1  6.823  2338  0.25  0  N.A.  N.A.  N.A. 
IFGM()  100  1.057  349.55  0.034  100  2.461  823.52  0.083  98  4.448  1478.8  0.165  
C&W()  100  0.48  142.4  0.016  100  0.681  215.4  0.03  100  0.866  275.4  0.042  
ADMM()  100  0.416  117.3  0.015  100  0.568  177.6  0.022  97  0.701  229.08  0.0322 
The proposed ADMM attacks are compared with stateoftheart attacks, including C&W attacks [6], EAD attack, FGM and IFGM attacks, on three image classification datasets, MNIST [21], CIFAR10 [18] and ImageNet [8]. We also test our attacks against two defenses, defensive distillation [26] and adversarial training [33], and evaluate the transferability of ADMM attacks.
5.1. Experiment Setup and Parameter Setting
Our experiment setup is based on C&W attack setup for fair comparisons. Two networks are trained for MNIST and CIFAR10 datasets, respectively. For the ImageNet dataset, a pretrained network is utilized. The network architecture for MNIST and CIFAR10 has four convolutional layers, two max pooling layers, two fully connected layers and a softmax layer. It can achieve 99.5% accuracy on MNIST and 80% accuracy on CIFAR10. For ImageNet, a pretrained Inception v3 network
[30] is applied so there is no need to train our own model. The Google Inception model can achieve 96% top5 accuracy with image inputs of size . All experiments are conducted on machines with an Intel I77700K CPU, 32 GB RAM and an NVIDIA GTX 1080 TI GPU.The implementations of FGM and IFGM are based on the CleverHans package [24]. The key distortion parameter is determined through a finegrained grid search. For each image, the smallest in the grid leading to a successful attack is reported. For IFGM, we perform 10 FGM iterations. The distortion parameter in each FGM iteration is set to be , which is quite effective shown in [33].
The implementations of C&W attacks and EAD attack are based on the github code released by the authors. The EAD attack has two decision rules when selecting the final adversarial example: the least elasticnet (EN) and distortion measurement (). Usually, the decision rule can achieve lower distortion than the EN decision rule as the EN decision rule considers a mixture of and distortions. We use the decision rule for fair comparison.
5.2. Attack Success Rate and Distortion for ADMM attack
The ADMM attack is compared with FGM, IFGM and C&W attacks. The attack success rate (ASR) represents the percentage of the constructed adversarial examples that are successfully classified as target labels. The average distortion of all successful adversarial examples is reported. For zero ASR, its distortion is not available (N.A.). We craft adversarial examples on MNIST, CIFAR10 and ImageNet. For MNIST and CIFAR10, 1000 correctly classified images are randomly selected from the test sets and 9 target labels are tested for each image, so we perform 9000 attacks for each dataset using each attack method. For ImageNet, 100 correctly classified images are randomly selected and 9 random target labels are used for each image.
The parameter is fixed to 20. The number of ADMM iterations is set to 10. In each ADMM iteration, Adam optimizer is utilized to solve the second subproblem based on stochastic gradient descent. When using Adam optimizer, we do binary search for 9 steps on the parameter (starting from 0.001) and runs 1000 learning iterations for each with learning rate 0.02 for MNIST and 0.002 for CIFAR10 and ImageNet. The attack transferability parameter is set to .
Table 1 shows the results on MNIST, CIFAR10 and ImageNet. As we can see, FGM fails to generate adversarial examples with high success rate since it is designed to be fast, rather than optimal. Among IFGM, C&W and ADMM attacks, ADMM achieves the lowest distortion for the best case, average case and worst case. IFGM has larger distortions compared with C&W and ADMM attacks on the three datasets, especially on ImageNet. For MNIST, the ADMM attack can reduce the distortion by about 7% compared with C&W attack. This becomes more prominent on ImageNet that ADMM reduces distortion by 19% comparing with C&W in the worst case.
We also observe that on CIFAR10, ADMM attack can achieve lower distortions but the reductions are not as prominent as that on MNIST or ImageNet. The reason may be that CIFAR10 is the easiest dataset to attack since it requires the lowest distortion among the three datasets. So both ADMM attack and C&W attack can achieve quite good performance. Note that in most cases on the three datasets, ADMM attack can achieve lower , and distorions than C&W attack, indicating a comprehensive enhancement of the ADMM attack over C&W attack.
Dataset  Attack method  Best case  Average case  Worst case  
ASR  ASR  ASR  
MNIST  C&W()  100  8.1  100  17.48  100  31.48 
ADMM()  100  8  100  15.71  100  25.87  
CIFAR  C&W()  100  8.6  100  19.6  100  34.4 
ADMM()  100  8.25  100  18.8  100  31.2 
Data Set  Attack Method  Best Case  Average Case  Worst Case  
ASR  ASR  ASR  
MNIST  FGM()  100  29.6  2.42  0.57  36.5  51.2  3.99  0.8  0  N.A.  N.A.  N.A. 
IFGM()  100  18.7  1.6  0.41  100  33.9  2.6  0.58  100  54.8  4.04  0.81  
EAD()  100  7.08  1.49  0.56  100  12.5  2.08  0.77  100  18.8  2.57  0.92  
ADMM()  100  6.0  2.07  0.97  100  10.61  2.72  0.99  100  16.6  3.41  1  
CIFAR10  FGM()  98.5  18.25  0.53  0.057  47  48.32  1.373  0.142  1  33.99  0.956  0.101 
IFGM()  100  6.28  0.184  0.21  100  13.72  0.394  0.44  100  22.84  0.65  0.74  
EAD()  100  2.44  0.31  0.084  100  6.392  0.6  0.185  100  10.21  0.865  0.31  
ADMM()  100  2.09  0.319  0.102  100  5.0  0.591  0.182  100  7.453  0.77  0.255  
ImageNet  FGM()  12  229  0.73  0.028  1  67  0.165  0.08  0  N.A.  N.A.  N.A. 
IFGM()  93  311  0.966  0.033  67  498.5  1.5  0.051  47  720.2  2.2  0.08  
EAD()  100  65.4  0.632  0.047  100  165.5  1.02  0.06  100  290  1.43  0.08  
ADMM()  100  56.1  0.904  0.053  100  92.7  1.15  0.0784  100  142.1  1.473  0.102 
5.3. Attack Success Rate and Distortion for ADMM attack
The performance of ADMM attack in terms of attack success rate and norm of distortion is demonstrated in this section. The ADMM attack is compared with C&W attack on MNIST and CIFAR10. 500 images are randomly selected from the test sets of MNIST and CIFAR10, respectively. Each image has 9 target labels and we perform 4500 attacks for each dataset using either ADMM or C&W attack.
For ADMM attack, 9 binary search steps are performed to search for the parameter while is fixed to 20 for MNIST and 200 for CIFAR10. The initial value of is set to 3 for MNIST and 40 for CIFAR10, respectively. The number of ADMM iterations is 10. In each ADMM iteration, Adam optimizer is utilized to solve the second subproblem with 1000 Adam iterations while the learning rate is set to 0.01 for MNIST and CIFAR10.
The results of the attacks are shown in Table 2. As observed from the table, both C&W and ADMM attacks can achieve 100% attack success rate. For the best case, C&W attack and ADMM attack have relatively close performance in terms of distortion. For the worst case, ADMM attack can achieve lower distortion than C&W. ADMM attack reduces the distortion by up to 17% on MNIST. We also note that the differences between C&W and ADMM attacks are smaller on CIFAR10 than that on MNIST.
Data Set  Attack Method  Best Case  Average Case  Worst Case  
ASR  ASR  ASR  
MNIST  FGM()  100  0.194  84.9  4.04  35  0.283  122.7  5.85  0  N.A.  N.A.  N.A. 
IFGM()  100  0.148  50.9  2.48  100  0.233  71.2  3.44  100  0.378  96.8  4.64  
ADMM()  100  0.135  35.9  2.068  100  0.178  48  2.73  100  0.218  60.2  3.37  
CIFAR10  FGM()  100  0.015  42.8  0.78  53  0.48  136  2.5  1.5  0.31  712  14 
IFGM()  100  0.0063  14.36  0.28  100  0.015  26.2  0.54  100  0.026  37.7  0.826  
ADMM()  100  0.0061  12.8  0.25  100  0.0114  23.07  0.47  100  0.017  31.9  0.65  
ImageNet  FGM()  20  0.0873  22372  43.55  1.5  0.0005  134  0.26  0  N.A.  N.A.  N.A. 
IFGM()  100  0.0046  542.4  1.27  100  0.0128  1039.6  2.54  100  0.0253  1790.2  4.4  
ADMM()  100  0.0041  280.2  0.773  100  0.0059  427.7  1.10  100  0.0092  624.1  1.6 
5.4. Attack Success Rate and Distortion for ADMM attack
We compare the ADMM attack with FGM, IFGM and EAD [7] attacks. The attack success rate (ASR) and the average distortion of all successful adversarial examples are reported. We perform the adversarial attacks on MNIST, CIFAR10 and ImageNet. For MNIST and CIFAR10, 1000 correctly classified images are randomly selected from the test sets and 9 target labels are tested for each image, so we perform 9000 attacks for each dataset using each attack method. For ImageNet, 100 correctly classified images and 9 target labels are randomly selected.
The number of ADMM iterations is set to 80. In each ADMM iteration, Adam optimizer is utilized to solve the second subproblem based on stochastic gradient descent. When using Adam optimizer, we run 2000 learning iterations with initial learning rate 0.1 for MNIST and 0.001 for CIFAR10 and ImageNet. The parameter is fixed to 2 for MNIST, 40 for CIFAR10, and 200 for ImageNet. The parameter is fixed to 10 for MNIST, 300 for CIFAR10, and 2000 for ImageNet. Note that we do not perform binary search of or as fixed and can achieve good performance.
The results of the ADMM attack are shown in Table 3. We can observe that both EAD and ADMM attacks can achieve 100% attack success rate while FGM attack has bad performance and IFGM attack can not guarantee 100% ASR on ImageNet. ADMM attack can achieve the best performance compared with FGM, IFGM, and EAD attacks. As demonstrated in Table 3, the distortion measurements of ADMM and EAD attacks are relatively close in the best case while the improvement of ADMM attack over EAD attack is much larger for the worst case. In the best case, the ADMM attack can craft adversarial examples with a norm about 14% smaller than that of the EAD attack on MNIST, CIFAR10 and ImageNet. For the worst case, the norm of ADMM attack is about 28% lower on CIFAR10 and 50% lower on ImageNet compared with that of EAD attack.
5.5. Attack Success Rate and Distortion for ADMM attack
The ADMM attack is compared with FGM and IFGM attacks. The attack success rate (ASR) and the average distortion of all successful adversarial examples are reported. We perform the adversarial attacks on MNIST, CIFAR10 and ImageNet. For MNIST and CIFAR10, 1000 correctly classified images are randomly selected from the test sets and 9 target labels are tested for each image, so we perform 9000 attacks for each dataset using each attack method. For ImageNet, 100 correctly classified images and 9 target labels are randomly selected.
The parameter is fixed to 0.1. The number of ADMM iterations is 100 and the batch size is 90. In each ADMM iteration, Adam optimizer is utilized to solve the first and second subproblem based on stochastic gradient descent. Adam optimizer runs 1000 iterations to get the solution of the first subproblem while it executes 2000 iterations to solve the second subproblem. Note that in the second subproblem, is fixed to 0.1 as we find fixed can achieve good performance and there is no need to perform binary search of . The initial learning rate is set to 0.001 for MNIST and 0.002 for CIFAR10 and ImageNet. The attack transferability parameter is set to if we do not perform the transferability evaluation.
The results of the ADMM attack are demonstrated in Table 4. We can observe that both IFGM and ADMM attacks can achieve 100% attack success rate while FGM has bad performance. ADMM attack can achieve the best performance compared with FGM and IFGM attacks. We also note that the norms of ADMM and IFGM attacks are relatively close in the best case. Usually the distortion measure of ADMM attack is smaller than that of IFGM attack by no larger than 10% for the best case. In the worst case, the improvement of ADMM attack over IFGM attack is much more obvious. The distortion measure of ADMM attack is about 40% smaller than that of IFGM attack on MNIST or CIFAR10 dataset for the worst case. On ImageNet, the norm of ADMM attack is 64% lower than that of IFGM attack.
5.6. ADMM Attack Against Defensive Distillation and Adversarial Training
ADMM attacks can break the undefended DNNs with high success rate. It is also able to break DNNs with defensive distillation. We perform C&W attack, ADMM , , and attack for different temperature parameters on MNIST and CIFAR10. 500 randomly selected images are used as source to generate 4500 adversarial examples with 9 targets for each image on MNIST or CIFAR10. We find that the attack success rates of C&W attack and ADMM four attacks for different temperature are all 100%. Since distillation at temperature causes the value of logits to be approximately times larger while the relative values of logits remain unchanged, C&W attack and ADMM attack which work on the relative values of logits do not fail.
We further test ADMM attack against adversarial training on MNIST. C&W attack and ADMM attack are utilized to separately generate 9000 adversarial examples with 1000 randomly selected images from the training set as sources. Then we add the adversarial examples with correct labels into the training dataset and retrain the network with the enlarged training dataset. With the retained network, we perform ADMM attack on the adversarially trained networks (one with C&W adversarial examples, and one with ADMM adversarial examples), as shown in Fig. 2. ADMM attack can break all three networks (one unprotected, one retained with C&W adversarial examples, and one retained with ADMM adversarial examples) with 100% success rate. distortions on the latter two networks are higher than that on the first network, showing some defense effect of adversarial training. We also note that distortion on the third network is higher than the second network, which demonstrates higher defense efficiency of performing adversarial training with ADMM adversarial examples (partly because ADMM attack is stronger).
5.7. Attack Transferability
Here we test the transferability of ADMM adversarial attack. For each value of confidence parameter , we use ADMM attack and C&W attack to generate 9000 adversarial examples on MNIST, respectively. Then these examples are applied to attack the defensively distilled network with temperature . The ASR is reported in Fig. 3. As demonstrated in Fig. 3, when is small, ADMM attack can hardly achieve success on the defensively distilled network, which means the generated adversarial examples are not strong enough to break the defended network. Low transferability of the generated adversarial examples is observed when is low. As increases, the ASRs of the three cases increase, demonstrating increasing transferability. When , the ASRs of three cases can achieve the maximum value. The ASR of average case is nearly 98%, meaning most of the generated adversarial examples on the undefended network can also break the defensively distilled network with . Also note that when , the ASRs of average case and worst case decrease as increases. The reason is that it’s quite difficult to generate adversarial examples even for the undefended network when is very large. Thus an decrease on the ASR is observed for average case and worst case, and the advantages of strong transferable adversarial examples are mitigated by the difficulty to generate such strong attacks. We also note that when , the ASRs of ADMM attack for average case and worst case are higher than the ASRs of C&W attack, demonstrating higher transferability of the ADMM attack.
6. Conclusion
In this paper, we propose an ADMMbased general framework for adversarial attacks. Under the ADMM framework, , , and attacks are proposed and implemented. We compare the ADMM attacks with stateoftheart adversarial attacks, showing ADMM attacks are so far the strongest. The ADMM attack is also applied to break two defense methods, the defensive distillation and adversarial training. Experimental results show the effectiveness of the proposed ADMM attacks with strong transferability.
References
 [1]
 Athalye et al. [2018] Anish Athalye, Nicholas Carlini, and David Wagner. 2018. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. arXiv preprint arXiv:1802.00420 (2018).
 Bhagoji et al. [2017] Arjun Nitin Bhagoji, Daniel Cullina, and Prateek Mittal. 2017. Dimensionality reduction as a defense against evasion attacks on machine learning classifiers. arXiv preprint arXiv:1704.02654 (2017).
 Boyd et al. [2011] Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, Jonathan Eckstein, et al. 2011. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine Learning 3, 1 (2011), 1–122.
 Carlini et al. [2016] Nicholas Carlini, Pratyush Mishra, Tavish Vaidya, Yuankai Zhang, Micah Sherr, Clay Shields, David Wagner, and Wenchao Zhou. 2016. Hidden Voice Commands.. In USENIX Security Symposium. 513–530.
 Carlini and Wagner [2017] Nicholas Carlini and David Wagner. 2017. Towards evaluating the robustness of neural networks. In Security and Privacy (SP), 2017 IEEE Symposium on. IEEE, 39–57.
 Chen et al. [2017] PinYu Chen, Yash Sharma, Huan Zhang, Jinfeng Yi, and ChoJui Hsieh. 2017. EAD: elasticnet attacks to deep neural networks via adversarial examples. arXiv preprint arXiv:1709.04114 (2017).

Deng
et al. [2009]
Jia Deng, Wei Dong,
Richard Socher, LiJia Li,
Kai Li, and Li FeiFei.
2009.
Imagenet: A largescale hierarchical image
database. In
Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on
. IEEE, 248–255.  Dhillon et al. [2018] Guneet S Dhillon, Kamyar Azizzadenesheli, Zachary C Lipton, Jeremy Bernstein, Jean Kossaifi, Aran Khanna, and Anima Anandkumar. 2018. Stochastic Activation Pruning for Robust Adversarial Defense. arXiv preprint arXiv:1803.01442 (2018).
 Dziugaite et al. [2016] Gintare Karolina Dziugaite, Zoubin Ghahramani, and Daniel M Roy. 2016. A study of the effect of jpg compression on adversarial images. arXiv preprint arXiv:1608.00853 (2016).
 Feinman et al. [2017] Reuben Feinman, Ryan R Curtin, Saurabh Shintre, and Andrew B Gardner. 2017. Detecting adversarial samples from artifacts. arXiv preprint arXiv:1703.00410 (2017).
 Goodfellow et al. [2014] Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. 2014. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014).
 Guo et al. [2017] Chuan Guo, Mayank Rana, Moustapha Cissé, and Laurens van der Maaten. 2017. Countering Adversarial Images using Input Transformations. arXiv preprint arXiv:1711.00117 (2017).
 He et al. [2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.
 Hinton et al. [2012] Geoffrey Hinton, Li Deng, Dong Yu, George E Dahl, Abdelrahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N Sainath, et al. 2012. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine 29, 6 (2012), 82–97.
 Hong and Luo [2017] Mingyi Hong and ZhiQuan Luo. 2017. On the linear convergence of the alternating direction method of multipliers. Mathematical Programming 162, 1 (01 Mar 2017), 165–199. https://doi.org/10.1007/s1010701610342
 Kingma and Ba [2015] Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. 2015 ICLR arXiv preprint arXiv:1412.6980 (2015). arXiv:1412.6980 http://arxiv.org/abs/1412.6980
 Krizhevsky and Hinton [2009] A. Krizhevsky and G. Hinton. 2009. Learning multiple layers of features from tiny images. Master’s thesis, Department of Computer Science, University of Toronto (2009).

Krizhevsky
et al. [2012]
Alex Krizhevsky, Ilya
Sutskever, and Geoffrey E Hinton.
2012.
Imagenet classification with deep convolutional neural networks. In
Advances in neural information processing systems. 1097–1105.  Kurakin et al. [2016] Alexey Kurakin, Ian Goodfellow, and Samy Bengio. 2016. Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533 (2016).
 Lecun et al. [1998] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. 1998. Gradientbased learning applied to document recognition. Proc. IEEE 86, 11 (Nov 1998), 2278–2324. https://doi.org/10.1109/5.726791

Makantasis et al. [2015]
Konstantinos Makantasis,
Konstantinos Karantzalos, Anastasios
Doulamis, and Nikolaos Doulamis.
2015.
Deep supervised learning for hyperspectral data classification through convolutional neural networks. In
Geoscience and Remote Sensing Symposium (IGARSS), 2015 IEEE International. IEEE, 4959–4962.  Nguyen et al. [2015] Anh Nguyen, Jason Yosinski, and Jeff Clune. 2015. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 427–436.
 Papernot et al. [2016a] Nicolas Papernot, Ian Goodfellow, Ryan Sheatsley, Reuben Feinman, and Patrick McDaniel. 2016a. cleverhans v1.0.0: an adversarial machine learning library. arXiv preprint arXiv:1610.00768 (2016).
 Papernot et al. [2016b] Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z Berkay Celik, and Ananthram Swami. 2016b. The limitations of deep learning in adversarial settings. In Security and Privacy (EuroS&P), 2016 IEEE European Symposium on. IEEE, 372–387.
 Papernot et al. [2016c] Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami. 2016c. Distillation as a defense to adversarial perturbations against deep neural networks. In Security and Privacy (SP), 2016 IEEE Symposium on. IEEE, 582–597.
 Parikh et al. [2014] Neal Parikh, Stephen Boyd, et al. 2014. Proximal algorithms. Foundations and Trends® in Optimization 1, 3 (2014), 127–239.
 Silver et al. [2016] David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. 2016. Mastering the game of Go with deep neural networks and tree search. nature 529, 7587 (2016), 484–489.
 Su et al. [2017] Jiawei Su, Danilo Vasconcellos Vargas, and Sakurai Kouichi. 2017. One pixel attack for fooling deep neural networks. arXiv preprint arXiv:1710.08864 (2017).
 Szegedy et al. [2016] Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, and Zbigniew Wojna. 2016. Rethinking the Inception Architecture for Computer Vision. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016), 2818–2826.
 Szegedy et al. [2013] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2013. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013).
 Taigman et al. [2014] Yaniv Taigman, Ming Yang, Marc’Aurelio Ranzato, and Lior Wolf. 2014. Deepface: Closing the gap to humanlevel performance in face verification. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1701–1708.
 Tramèr et al. [2018] F. Tramèr, A. Kurakin, N. Papernot, I. Goodfellow, D. Boneh, and P. McDaniel. 2018. Ensemble Adversarial Training: Attacks and Defenses. 2018 ICLR arXiv preprint arXiv:1705.07204 (2018).
 Wang and Banerjee [2014] Huahua Wang and Arindam Banerjee. 2014. Bregman Alternating Direction Method of Multipliers. In Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 2816–2824. http://papers.nips.cc/paper/5612bregmanalternatingdirectionmethodofmultipliers.pdf
 Xie et al. [2017] Cihang Xie, Jianyu Wang, Zhishuai Zhang, Zhou Ren, and Alan Yuille. 2017. Mitigating adversarial effects through randomization. arXiv preprint arXiv:1711.01991 (2017).
Comments
There are no comments yet.