An ADMM-Based Universal Framework for Adversarial Attacks on Deep Neural Networks

04/09/2018 ∙ by Pu Zhao, et al. ∙ 0

Deep neural networks (DNNs) are known vulnerable to adversarial attacks. That is, adversarial examples, obtained by adding delicately crafted distortions onto original legal inputs, can mislead a DNN to classify them as any target labels. In a successful adversarial attack, the targeted mis-classification should be achieved with the minimal distortion added. In the literature, the added distortions are usually measured by L0, L1, L2, and L infinity norms, namely, L0, L1, L2, and L infinity attacks, respectively. However, there lacks a versatile framework for all types of adversarial attacks. This work for the first time unifies the methods of generating adversarial examples by leveraging ADMM (Alternating Direction Method of Multipliers), an operator splitting optimization approach, such that L0, L1, L2, and L infinity attacks can be effectively implemented by this general framework with little modifications. Comparing with the state-of-the-art attacks in each category, our ADMM-based attacks are so far the strongest, achieving both the 100 success rate and the minimal distortion.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Deep learning has been demonstrating exceptional performance on several categories of machine learning problems and has been applied in many settings [8; 19; 32; 14; 15; 28; 22]. However, people recently find that deep neural networks (DNNs) could be vulnerable to adversarial attacks [5; 23; 20], which arouses concerns of applying deep learning in security-critical tasks. Adversarial attacks are implemented through generating adversarial examples, which are crafted by adding delicate distortions onto legal inputs. Fig. 1 shows adversarial examples for targeted adversarial attacks that can fool DNNs.

The security properties of deep learning have been investigated from two aspects: (i) enhancing the robustness of DNNs under adversarial attacks and (ii) crafting adversarial examples to test the vulnerability of DNNs. For the former aspect, research works have been conducted by either filtering out added distortions [13; 3; 10; 35] or revising DNN models [26; 9; 11]

to defend against adversarial attacks. For the later aspect, adversarial examples have been generated heuristically

[12; 29], iteratively [25; 20; 16; 34], or by solving optimization problems [31; 6; 7; 2]. These two aspects mutually benefit each other towards hardening DNNs under adversarial attacks. And our work deals with the problem from the later aspect.

For targeted adversarial attacks, the crafted adversarial examples should be able to mislead the DNN to classify them as any target labels, as done in Fig. 1. Also, in a successful adversarial attack, the targeted mis-classification should be achieved with the minimal distortion added to the original legal input. Here comes the question of how to measure the added distortions. Currently, in the literature, , , , and norms are used to measure the added distortions, and they are respectively named , , , and adversarial attacks. Even though no measure can be perfect for human perceptual similarity, these measures or attack types may be employed for different application specifications. This work bridges the literature gap by unifying all the types of attacks with a single intact framework.

(a) MNIST
(b) CIFAR-10
Figure 1. Adversarial examples generated by our ADMM , , , and attacks for MNIST (left) and CIFAR-10 (right) datasets. The leftmost column contains the original legal inputs. The next four columns are the corresponding adversarial examples crafted using our ADMM , , , and attacks, respectively. If the original inputs are correctly classified as label , then the adversarial examples mislead the DNN to classify them as target label .

In order to benchmark DNN defense techniques and to push for a limit of the DNN security level, we should develop the strongest adversarial attacks. For this purpose, we adopt the white-box attack assumption in that the attackers have complete information about the DNN architectures and all the parameters. This is also a realistic assumption, because even if we only have black-box access to the DNN model, we can train a substitute model and transfer the attacks generated using the substitute model. And for the same purpose, we adopt the optimization-based approach to generate adversarial examples. The objectives of the optimization problem should be (i) misleading the DNN to classify the adversarial example as a target label and (ii) minimizing the norm of the added distortion.

By leveraging ADMM (Alternating Direction Method of Multipliers) [4], an operator splitting optimization approach, we provide a universal framework for , , , and

adversarial attacks. ADMM decomposes an original optimization problem into two correlated subproblems, each of which can be solved more efficiently or analytically, and then coordinates solutions to the subproblems to construct a solution to the original problem. This decomposition-alternating procedure of ADMM blends the benefits of dual decomposition and augmented Lagrangian for solving problems with non-convex and combinatorial constraints. Therefore, ADMM introduces no additional sub-optimality besides the original gradient-based backpropagation method commonly used in DNNs and provides a faster linear convergence rate than state-of-the-art iterative attacks

[25; 20; 16; 34]. We also compare with the optimization-based approaches, i.e., Carlini & Wagner (C&W) attack [6] and Elastic-net (EAD) attack [7], which are the currently strongest attacks in the literature.

The major contributions of this work and its differences from C&W and EAD attacks are summarized as follows:

  • With our ADMM-based universal framework, all the , , , and adversarial attacks can be implemented with little modifications, while C&W only performs , , and attacks and EAD only performs and attacks.

  • C&W attack needs to run their attack iteratively to find the pixels with the least effect and fix them, thereby identifying a minimal subset of pixels for modification to generate an adversarial example.

  • C&W attack through naively optimization with gradient descent may produce very poor initial results. They solve the issue by introducing a limit on the norm and reducing the limit iteratively.

  • EAD attack minimizes a weighted sum of and norms. However, a universal attack generation model is missing.

  • Our extensive experiments show that we are so far the best attacks. Besides the 100% attack success rate, our ADMM-based attacks outperform C&W and EAD in each type of attacks in terms of minimal distortion.

Besides comparing with C&W, EAD and other attacks, we also test our attacks against defenses such as defensive distillation

[26] and adversarial training [33], demonstrating the success of our attacks. In addition, we validate the transferability of our attacks onto different DNN models. The codes of our attacks to reproduce the results are available online111Codes will be available upon publication of this work..

2. Related Work

We introduce the most representative attacks and defenses in this section.

2.1. Adversarial Attacks

L-BFGS Attack [31] is the first optimization-based attack and is an attack that uses norm to measure the distortion in the optimization objective function.

JSMA Attack [25] is an

attack and uses a greedy algorithm that picks the most influential pixels by calculating Jacobian-based Saliency Map and modifies the pixels iteratively. The computational complexity is prohibitive even for applying to ImageNet dataset.

FGSM [12] and IFGSM [20] Attacks are

attacks and utilize the gradient of the loss function to determine the direction to modify the pixels. They are designed to be fast, rather than optimal. They can be used for adversarial training by directly changing the loss function instead of explicitly injecting adversarial examples into the training data. The fast gradient method (FGM) and the iterative fast gradient method (IFGM) are improvements of FGSM and IFGSM, respectively, that can be fitted as

, , and attacks.

C&W Attacks [6] are a series of , , and attacks that achieve 100% attack success rate with much lower distortions comparing with the above-mentioned attacks. In particular, the C&W attack is superior to L-BFGS attack (which is also an attack) because it uses a better objective function.

EAD Attack [7] formulates the process of crafting adversarial examples as an elastic-net regularized optimization problem. Elastic-net regularization is a linear mixture of and norms used in the penalty function. EAD attack is able to craft -oriented adversarial examples and includes the C&W attack as a special case.

2.2. Representative Defenses

Defensive Distillation [26] introduces temperature

into the softmax layer and uses a higher temperature for training and a lower temperature for testing. The training phase first trains a teacher model that can produce soft labels for the training dataset and then trains a distilled model using the training dataset with soft labels. The distilled model with reduced temperature will be preserved for testing.

Adversarial Training [33] injects adversarial examples with correct labels into the training dataset and then retrains the neural network, thus increasing robustness of DNNs under adversarial attacks.

3. An ADMM-Based Universal Framwork for Adversarial Attacks

ADMM was first introduced in the mid-1970s with roots in the 1950s, and the algorithm and theory have been established by the mid-1990s. It was proposed and made popular recently by S. Boyd et al. for statistics and machine learning problems with a very large number of features or training examples [4]. ADMM method takes the form of a decomposition-alternating procedure, in which the solutions to small local subproblems are coordinated to find a solution to a large global problem. It can be viewed as an attempt to blend the benefits of dual decomposition and augmented Lagrangian methods for constrained optimization.

ADMM was developed in part to bring robustness to the dual ascent method, and in particular, to yield convergence without assumptions like strict convexity or finiteness of the objective. ADMM is also capable of dealing with combinatorial constraints due to its decomposition property. It can be used in many practical applications, where the convexity of the objective can not be guaranteed or it has some combinatorial constraints. Besides, it converges fast in many cases since the two arguments are updated in an alternating or sequential fashion, which accounts for the term alternating direction.

3.1. Notations and Definitions

In this paper, we mainly evaluate the adversarial attacks with image classification tasks. A two dimensional vector

represents a gray-scale image with height and width

. For a colored RGB image with three channels, a three dimensional tensor

is utilized to denote it. Each element represents the value of the -th pixel and is scaled to the range of . A neural network has the model , where generates an output given an input . Model is fixed since we perform attacks on given neural network models.

The output layer performs softmax operation and the neural network is an

-class classifier. Let the logits

denote the input to the softmax layer, which represents the output of all layers except for the softmax layer. We have . The element of the output vector

represents the probability that input

belongs to the -th class. The output vector

is treated as a probability distribution, and its elements satisfy

and . The neural network classifies input according to the maximum probability, i.e., .

The adversarial attack can be either targeted or untargeted. Given an original legal input with its correct label , the untargeted adversarial attack is to find an input satisfying while and are close according to some measure of the distortion. The untargeted adversarial attack does not specify any target label to mislead the classifier. In the targeted adversarial attack, with a given target label , an adversarial example is an input such that while and are close according to some measure of the distortion. In this work, we consider targeted adversarial attacks since they are believed stronger than untargeted attacks.

3.2. General ADMM Framework for Adversarial Attacks

The initial problem of constructing adversarial examples is defined as: Given an original legal input image and a target label , find an adversarial example , such that is minimized, , and . is the distortion added onto the input . is the classification function of the neural network and the adversarial example is classified as the target label .

is a measure of the distortion . We need to measure the distortion between the original legal input and the adversarial example . norms are the most commonly used measures in the literature. The norm of the distortion between and is defined as:

(1)

We see the use of , , , and norms in different attacks.

  • norm: measures the number of mismatched elements between and .

  • norm: measures the sum of the absolute values of the differences between and .

  • norm: measures the standard Euclidean distance between and .

  • norm: measures the maximum difference between and for all ’s.

In this work, with a general ADMM-based framework, we implement , , , and attacks, respectively. When generating adversarial examples in the four attacks, in the objective function becomes , , , and norms, respectively. For the simplicity of expression, in the general ADMM-based framework, the form of is used to denote the measure of . When introducing the detailed four attacks based on the ADMM framework, we utilize the form of norm to represent the distortion measure.

ADMM provides a systematic way to deal with non-convex and combinatorial constraints by breaking the initial problem into two subproblems. To do this, the initial problem is first transformed into the following problem, introducing an auxiliary variable :

(2)

where has the form:

(3)

Here is the logits before the softmax layer. means the -th element of . The function ensures that the input is classified with target label . The augmented Lagrangian function of problem (2) is as follows:

(4)

where is the dual variable or Lagrange multiplier and is called the penalty parameter. Using the scaled form of ADMM by defining , we have:

(5)

ADMM solves problem (2) through iterations. In the -th iteration, the following steps are performed:

(6)
(7)
(8)

In Eqn. (6), we find which minimizes with fixed and . Similarly, in Eqn. (7), and are fixed and we find minimizing . is then updated accordingly. Note that the two variables and are updated in an alternating or sequential fashion, from which the term alternating direction comes. It converges when:

(9)

Equivalently, in each iteration, we solve two optimization subproblems corresponding to Eqns. (6) and (7), respectively:

(10)

and

(11)

The non-differentiable makes it difficult to solve the second subproblem (11). Therefore, a new differentiable inspired by [6] is utilized as follows:

(12)

Then, stochastic gradient decent methods can be used to solve this subproblem. The Adam optimizer [17] is applied due to its fast and robust convergence behavior. In the new of Eqn. (12), is a confidence parameter denoting the strength of adversarial example transferability. The larger , the stronger transferability of the adversarial example. It can be kept as 0 if we do not evaluate the transferability.

3.3. Box Constraint

The constraint on i.e., is known as a “box constraint” in the optimization literature. We use a new variable and instead of optimizing over defined above, we optimize over , based on:

(13)

Here the is performed elementwise. Since , the method will automatically satisfy the box constraint and allows us to use optimization algorithms that do not natively support box constraints.

3.4. Selection of Target Label

For targeted attacks, there are different ways to choose the target labels:

  • Average Case: select at random the target label uniformly among all the labels that are not the correct label.

  • Best Case: perform attacks using all incorrect labels, and report the target label that is the least difficult to attack.

  • Worst Case: perform attacks using all incorrect labels, and report the target label that is the most difficult to attack.

We evaluate the performs of the proposed ADMM attacks in the three cases mentioned above.

3.5. Discussion on Constants

There are two constants and in the two subproblems (10) and (11). Different policies are adopted for choosing appropriate and in , , and attacks. In attack, since acts in both problems (10) and (11), we fix and change to improve the solutions. We find that the best choice of is the smallest one that can help achieve in the subproblem (11). Thus, a modified binary search is used to find a satisfying . For the ADMM attack, as has stronger and more direct influence on the solutions, is fixed and adaptive search of is utilized. More details are provided in Section 4.2. For the ADMM and attacks, as we find fixed and can achieve good performance, and are kept unchanged and adaptive search method is not used.

4. Instantiations of , , and Attacks based on ADMM Framework

The ADMM framework for adversarial attacks now need to solve two subproblems (10) and (11). The difference between , , and attacks lies in the subproblem (10), while the processes to find the solutions of the subproblem (11

) based on stochastic gradient descent method are the very similar for the four attacks.

4.1. Attack

For attack, the subproblem (10) has the form:

(14)

the solution to which can be directly derived in an analytical format:

(15)

Then the complete solution to the attack problem using the ADMM framework is as follows: for the -th iteration,

(16)
(17)
(18)

Eqn. (16) corresponds to the analytical solution to the subproblem (10) i.e., problem (14) with Eqn. (13) replacing in Eqn. (15). Eqn. (17) corresponds to the subproblem (11) with Eqn. (13) replacing and taking the form of Eqn. (12). The solution to Eqn. (17) is derived through the Adam optimizer with stochastic gradient descent.

4.2. Attack

For attack, the subproblem (10) has the form:

(19)

Its equivalent optimization problem is as follows:

(20)

The solution to problem (19) can be obtained through where is the solution to problem (20). The solution to problem (20) can be derived in this way: let be equal to first, then for each element in , if its square is smaller than , make it zero. A proof for the solution is given in the following.

Lemma 4.1 ().

Suppose that two matrices , are of the same size, and that there are at least zero elements in . Then the optimal value of the following problem is the sum of the square of the smallest elements in .

(21)

The proof for the lemma is straightforward and we omit it for the sake of brevity. We use to denote the sum of the smallest ( is an element in ).

Theorem 4.2 ().

Set and then make those elements in zeros if their square are smaller than . Such would yield the minimum objective value of problem (20).

Proof.

Suppose that is constructed according to the above rule in Theorem 1, and has elements equal to 0. We need to prove that is the optimal solution with the minimum objective value. Suppose we have another arbitrary solution with elements equal to 0. Both and have elements. The objective value of solution is:

(22)

The objective value of solution is:

(23)

The inequality in Eqn. (23) holds due to Lemma 4.1.

If , then according to the definition of , we have

(24)

So that

(25)

If , then according to the definition of , we have

(26)

So that

(27)

Thus, we can see that our solution can achieve the minimum objective value and it is the optimal solution. ∎

When solving the subproblem (19) according to Theorem 4.2, we enforce a hidden constraint on the distortion , that the square of each non-zero element in must be larger than . Therefore, a smaller would push ADMM method to find with larger non-zero elements, thus reducing the number of non-zero elements and decreasing norm. Empirically, we find the constant represents a trade-off between attack success rate and norm of the distortion, i.e., a larger can help find solutions with higher attack success rate at the cost of larger norm of the distortion.

Then the complete solution to the attack problem using the ADMM framework can be derived similar to the attack. More specifically, in each iteration, Theorem 4.2 is applied to obtain the optimal and . Then we solve Eqn. (17) with Adam optimizer and update parameters through (18).

4.3. Attack

For attack, the subproblem (10) has the form:

(28)

Problem (28) has the closed-form solution. If we change the variable , then problem becomes

(29)

The solution of problem (29) is given by the soft thresholding operator evaluated at the point with a parameter [27],

(30)

where is taken in elementwise, and if , and otherwise. Therefore, the solution to problem (28) is given by

(31)

The complete solution to the attack problem using the ADMM framework is similar to the attack. In each iteration, we obtain the closed-form solution of the first subproblem (28) and then Adam optimizer is utilized to solve the second subproblem (17). Next we update the parameters through Eqn. (18).

4.4. Attack

For attack, the subproblem (10) has the form:

(32)

This problem does not have a closed form solution. One possible method is to derive the KKT conditions of problem (32) [27]. Here we use stochastic gradient decent methods to solve it. In the experiments, we find that the Adam optimizer [17] could achieve fast and robust convergence results. So Adam optimizer is utilized to solve Eqn. (32). Since Eqn. (32) is relatively simpler compared with Eqn. (17), the complexity for solving Eqn. (32) with Adam optimizer is negligible.

The complete solution to the attack problem using the ADMM framework can be derived similar to the attack. In the -th iteration, we first use Adam optimizer to get the optimal in Eq. (32). Then we solve Eq. (17) and update parameters through Eq. (18) as the attack.

5. Performance Evaluation

Data Set Attack Method Best Case Average Case Worst Case
ASR ASR ASR
MNIST FGM() 99.4 2.245 25.84 0.574 34.6 3.284 39.15 0.747 0 N.A. N.A. N.A.
IFGM() 100 1.58 18.51 0.388 99.9 2.50 32.63 0.562 99.6 3.958 55.04 0.783
C&W() 100 1.393 13.57 0.402 100 2.002 22.31 0.54 99.9 2.598 31.43 0.689
ADMM() 100 1.288 13.87 0.345 100 1.873 22.52 0.498 100 2.445 31.427 0.669
CIFAR-10 FGM() 99.5 0.421 14.13 0.05 42.8 1.157 39.5 0.136 0.7 3.115 107.1 0.369
IFGM() 100 0.191 6.549 0.022 100 0.432 15.13 0.047 100 0.716 25.22 0.079
C&W() 100 0.178 6.03 0.019 100 0.347 12.115 0.0364 99.9 0.481 16.75 0.0536
ADMM() 100 0.173 5.8 0.0192 100 0.337 11.65 0.0365 100 0.476 16.73 0.0535
ImageNet FGM() 12 2.29 752.9 0.087 1 6.823 2338 0.25 0 N.A. N.A. N.A.
IFGM() 100 1.057 349.55 0.034 100 2.461 823.52 0.083 98 4.448 1478.8 0.165
C&W() 100 0.48 142.4 0.016 100 0.681 215.4 0.03 100 0.866 275.4 0.042
ADMM() 100 0.416 117.3 0.015 100 0.568 177.6 0.022 97 0.701 229.08 0.0322
Table 1. Adversarial attack success rate (ASR) and distortion of different attacks for different datasets

The proposed ADMM attacks are compared with state-of-the-art attacks, including C&W attacks [6], EAD attack, FGM and IFGM attacks, on three image classification datasets, MNIST [21], CIFAR-10 [18] and ImageNet [8]. We also test our attacks against two defenses, defensive distillation [26] and adversarial training [33], and evaluate the transferability of ADMM attacks.

5.1. Experiment Setup and Parameter Setting

Our experiment setup is based on C&W attack setup for fair comparisons. Two networks are trained for MNIST and CIFAR-10 datasets, respectively. For the ImageNet dataset, a pre-trained network is utilized. The network architecture for MNIST and CIFAR-10 has four convolutional layers, two max pooling layers, two fully connected layers and a softmax layer. It can achieve 99.5% accuracy on MNIST and 80% accuracy on CIFAR-10. For ImageNet, a pre-trained Inception v3 network

[30] is applied so there is no need to train our own model. The Google Inception model can achieve 96% top-5 accuracy with image inputs of size . All experiments are conducted on machines with an Intel I7-7700K CPU, 32 GB RAM and an NVIDIA GTX 1080 TI GPU.

The implementations of FGM and IFGM are based on the CleverHans package [24]. The key distortion parameter is determined through a fine-grained grid search. For each image, the smallest in the grid leading to a successful attack is reported. For IFGM, we perform 10 FGM iterations. The distortion parameter in each FGM iteration is set to be , which is quite effective shown in [33].

The implementations of C&W attacks and EAD attack are based on the github code released by the authors. The EAD attack has two decision rules when selecting the final adversarial example: the least elastic-net (EN) and distortion measurement (). Usually, the decision rule can achieve lower distortion than the EN decision rule as the EN decision rule considers a mixture of and distortions. We use the decision rule for fair comparison.

5.2. Attack Success Rate and Distortion for ADMM attack

The ADMM attack is compared with FGM, IFGM and C&W attacks. The attack success rate (ASR) represents the percentage of the constructed adversarial examples that are successfully classified as target labels. The average distortion of all successful adversarial examples is reported. For zero ASR, its distortion is not available (N.A.). We craft adversarial examples on MNIST, CIFAR-10 and ImageNet. For MNIST and CIFAR-10, 1000 correctly classified images are randomly selected from the test sets and 9 target labels are tested for each image, so we perform 9000 attacks for each dataset using each attack method. For ImageNet, 100 correctly classified images are randomly selected and 9 random target labels are used for each image.

The parameter is fixed to 20. The number of ADMM iterations is set to 10. In each ADMM iteration, Adam optimizer is utilized to solve the second subproblem based on stochastic gradient descent. When using Adam optimizer, we do binary search for 9 steps on the parameter (starting from 0.001) and runs 1000 learning iterations for each with learning rate 0.02 for MNIST and 0.002 for CIFAR-10 and ImageNet. The attack transferability parameter is set to .

Table 1 shows the results on MNIST, CIFAR-10 and ImageNet. As we can see, FGM fails to generate adversarial examples with high success rate since it is designed to be fast, rather than optimal. Among IFGM, C&W and ADMM attacks, ADMM achieves the lowest distortion for the best case, average case and worst case. IFGM has larger distortions compared with C&W and ADMM attacks on the three datasets, especially on ImageNet. For MNIST, the ADMM attack can reduce the distortion by about 7% compared with C&W attack. This becomes more prominent on ImageNet that ADMM reduces distortion by 19% comparing with C&W in the worst case.

We also observe that on CIFAR-10, ADMM attack can achieve lower distortions but the reductions are not as prominent as that on MNIST or ImageNet. The reason may be that CIFAR-10 is the easiest dataset to attack since it requires the lowest distortion among the three datasets. So both ADMM attack and C&W attack can achieve quite good performance. Note that in most cases on the three datasets, ADMM attack can achieve lower , and distorions than C&W attack, indicating a comprehensive enhancement of the ADMM attack over C&W attack.

Dataset Attack method Best case Average case Worst case
ASR ASR ASR
MNIST C&W() 100 8.1 100 17.48 100 31.48
ADMM() 100 8 100 15.71 100 25.87
CIFAR C&W() 100 8.6 100 19.6 100 34.4
ADMM() 100 8.25 100 18.8 100 31.2
Table 2. Adversarial attack success rate and distortion of ADMM and C&W attacks for MNIST and CIFAR-10
Data Set Attack Method Best Case Average Case Worst Case
ASR ASR ASR
MNIST FGM() 100 29.6 2.42 0.57 36.5 51.2 3.99 0.8 0 N.A. N.A. N.A.
IFGM() 100 18.7 1.6 0.41 100 33.9 2.6 0.58 100 54.8 4.04 0.81
EAD() 100 7.08 1.49 0.56 100 12.5 2.08 0.77 100 18.8 2.57 0.92
ADMM() 100 6.0 2.07 0.97 100 10.61 2.72 0.99 100 16.6 3.41 1
CIFAR-10 FGM() 98.5 18.25 0.53 0.057 47 48.32 1.373 0.142 1 33.99 0.956 0.101
IFGM() 100 6.28 0.184 0.21 100 13.72 0.394 0.44 100 22.84 0.65 0.74
EAD() 100 2.44 0.31 0.084 100 6.392 0.6 0.185 100 10.21 0.865 0.31
ADMM() 100 2.09 0.319 0.102 100 5.0 0.591 0.182 100 7.453 0.77 0.255
ImageNet FGM() 12 229 0.73 0.028 1 67 0.165 0.08 0 N.A. N.A. N.A.
IFGM() 93 311 0.966 0.033 67 498.5 1.5 0.051 47 720.2 2.2 0.08
EAD() 100 65.4 0.632 0.047 100 165.5 1.02 0.06 100 290 1.43 0.08
ADMM() 100 56.1 0.904 0.053 100 92.7 1.15 0.0784 100 142.1 1.473 0.102
Table 3. Adversarial attack success rate (ASR) and distortion of different attacks for different datasets

5.3. Attack Success Rate and Distortion for ADMM attack

The performance of ADMM attack in terms of attack success rate and norm of distortion is demonstrated in this section. The ADMM attack is compared with C&W attack on MNIST and CIFAR-10. 500 images are randomly selected from the test sets of MNIST and CIFAR-10, respectively. Each image has 9 target labels and we perform 4500 attacks for each dataset using either ADMM or C&W attack.

For ADMM attack, 9 binary search steps are performed to search for the parameter while is fixed to 20 for MNIST and 200 for CIFAR-10. The initial value of is set to 3 for MNIST and 40 for CIFAR-10, respectively. The number of ADMM iterations is 10. In each ADMM iteration, Adam optimizer is utilized to solve the second subproblem with 1000 Adam iterations while the learning rate is set to 0.01 for MNIST and CIFAR-10.

The results of the attacks are shown in Table 2. As observed from the table, both C&W and ADMM attacks can achieve 100% attack success rate. For the best case, C&W attack and ADMM attack have relatively close performance in terms of distortion. For the worst case, ADMM attack can achieve lower distortion than C&W. ADMM attack reduces the distortion by up to 17% on MNIST. We also note that the differences between C&W and ADMM attacks are smaller on CIFAR-10 than that on MNIST.

Data Set Attack Method Best Case Average Case Worst Case
ASR ASR ASR
MNIST FGM() 100 0.194 84.9 4.04 35 0.283 122.7 5.85 0 N.A. N.A. N.A.
IFGM() 100 0.148 50.9 2.48 100 0.233 71.2 3.44 100 0.378 96.8 4.64
ADMM() 100 0.135 35.9 2.068 100 0.178 48 2.73 100 0.218 60.2 3.37
CIFAR-10 FGM() 100 0.015 42.8 0.78 53 0.48 136 2.5 1.5 0.31 712 14
IFGM() 100 0.0063 14.36 0.28 100 0.015 26.2 0.54 100 0.026 37.7 0.826
ADMM() 100 0.0061 12.8 0.25 100 0.0114 23.07 0.47 100 0.017 31.9 0.65
ImageNet FGM() 20 0.0873 22372 43.55 1.5 0.0005 134 0.26 0 N.A. N.A. N.A.
IFGM() 100 0.0046 542.4 1.27 100 0.0128 1039.6 2.54 100 0.0253 1790.2 4.4
ADMM() 100 0.0041 280.2 0.773 100 0.0059 427.7 1.10 100 0.0092 624.1 1.6
Table 4. Adversarial attack success rate (ASR) and distortion of different attacks for different datasets

5.4. Attack Success Rate and Distortion for ADMM attack

We compare the ADMM attack with FGM, IFGM and EAD [7] attacks. The attack success rate (ASR) and the average distortion of all successful adversarial examples are reported. We perform the adversarial attacks on MNIST, CIFAR-10 and ImageNet. For MNIST and CIFAR-10, 1000 correctly classified images are randomly selected from the test sets and 9 target labels are tested for each image, so we perform 9000 attacks for each dataset using each attack method. For ImageNet, 100 correctly classified images and 9 target labels are randomly selected.

The number of ADMM iterations is set to 80. In each ADMM iteration, Adam optimizer is utilized to solve the second subproblem based on stochastic gradient descent. When using Adam optimizer, we run 2000 learning iterations with initial learning rate 0.1 for MNIST and 0.001 for CIFAR-10 and ImageNet. The parameter is fixed to 2 for MNIST, 40 for CIFAR-10, and 200 for ImageNet. The parameter is fixed to 10 for MNIST, 300 for CIFAR-10, and 2000 for ImageNet. Note that we do not perform binary search of or as fixed and can achieve good performance.

The results of the ADMM attack are shown in Table 3. We can observe that both EAD and ADMM attacks can achieve 100% attack success rate while FGM attack has bad performance and IFGM attack can not guarantee 100% ASR on ImageNet. ADMM attack can achieve the best performance compared with FGM, IFGM, and EAD attacks. As demonstrated in Table 3, the distortion measurements of ADMM and EAD attacks are relatively close in the best case while the improvement of ADMM attack over EAD attack is much larger for the worst case. In the best case, the ADMM attack can craft adversarial examples with a norm about 14% smaller than that of the EAD attack on MNIST, CIFAR-10 and ImageNet. For the worst case, the norm of ADMM attack is about 28% lower on CIFAR-10 and 50% lower on ImageNet compared with that of EAD attack.

5.5. Attack Success Rate and Distortion for ADMM attack

The ADMM attack is compared with FGM and IFGM attacks. The attack success rate (ASR) and the average distortion of all successful adversarial examples are reported. We perform the adversarial attacks on MNIST, CIFAR-10 and ImageNet. For MNIST and CIFAR-10, 1000 correctly classified images are randomly selected from the test sets and 9 target labels are tested for each image, so we perform 9000 attacks for each dataset using each attack method. For ImageNet, 100 correctly classified images and 9 target labels are randomly selected.

The parameter is fixed to 0.1. The number of ADMM iterations is 100 and the batch size is 90. In each ADMM iteration, Adam optimizer is utilized to solve the first and second subproblem based on stochastic gradient descent. Adam optimizer runs 1000 iterations to get the solution of the first subproblem while it executes 2000 iterations to solve the second subproblem. Note that in the second subproblem, is fixed to 0.1 as we find fixed can achieve good performance and there is no need to perform binary search of . The initial learning rate is set to 0.001 for MNIST and 0.002 for CIFAR-10 and ImageNet. The attack transferability parameter is set to if we do not perform the transferability evaluation.

The results of the ADMM attack are demonstrated in Table 4. We can observe that both IFGM and ADMM attacks can achieve 100% attack success rate while FGM has bad performance. ADMM attack can achieve the best performance compared with FGM and IFGM attacks. We also note that the norms of ADMM and IFGM attacks are relatively close in the best case. Usually the distortion measure of ADMM attack is smaller than that of IFGM attack by no larger than 10% for the best case. In the worst case, the improvement of ADMM attack over IFGM attack is much more obvious. The distortion measure of ADMM attack is about 40% smaller than that of IFGM attack on MNIST or CIFAR-10 dataset for the worst case. On ImageNet, the norm of ADMM attack is 64% lower than that of IFGM attack.

5.6. ADMM Attack Against Defensive Distillation and Adversarial Training

ADMM attacks can break the undefended DNNs with high success rate. It is also able to break DNNs with defensive distillation. We perform C&W attack, ADMM , , and attack for different temperature parameters on MNIST and CIFAR-10. 500 randomly selected images are used as source to generate 4500 adversarial examples with 9 targets for each image on MNIST or CIFAR-10. We find that the attack success rates of C&W attack and ADMM four attacks for different temperature are all 100%. Since distillation at temperature causes the value of logits to be approximately times larger while the relative values of logits remain unchanged, C&W attack and ADMM attack which work on the relative values of logits do not fail.

We further test ADMM attack against adversarial training on MNIST. C&W attack and ADMM attack are utilized to separately generate 9000 adversarial examples with 1000 randomly selected images from the training set as sources. Then we add the adversarial examples with correct labels into the training dataset and retrain the network with the enlarged training dataset. With the retained network, we perform ADMM attack on the adversarially trained networks (one with C&W adversarial examples, and one with ADMM adversarial examples), as shown in Fig. 2. ADMM attack can break all three networks (one unprotected, one retained with C&W adversarial examples, and one retained with ADMM adversarial examples) with 100% success rate. distortions on the latter two networks are higher than that on the first network, showing some defense effect of adversarial training. We also note that distortion on the third network is higher than the second network, which demonstrates higher defense efficiency of performing adversarial training with ADMM adversarial examples (partly because ADMM attack is stronger).

Figure 2. distortion of adversarial training for three cases on MNIST

5.7. Attack Transferability

Here we test the transferability of ADMM adversarial attack. For each value of confidence parameter , we use ADMM attack and C&W attack to generate 9000 adversarial examples on MNIST, respectively. Then these examples are applied to attack the defensively distilled network with temperature . The ASR is reported in Fig. 3. As demonstrated in Fig. 3, when is small, ADMM attack can hardly achieve success on the defensively distilled network, which means the generated adversarial examples are not strong enough to break the defended network. Low transferability of the generated adversarial examples is observed when is low. As increases, the ASRs of the three cases increase, demonstrating increasing transferability. When , the ASRs of three cases can achieve the maximum value. The ASR of average case is nearly 98%, meaning most of the generated adversarial examples on the undefended network can also break the defensively distilled network with . Also note that when , the ASRs of average case and worst case decrease as increases. The reason is that it’s quite difficult to generate adversarial examples even for the undefended network when is very large. Thus an decrease on the ASR is observed for average case and worst case, and the advantages of strong transferable adversarial examples are mitigated by the difficulty to generate such strong attacks. We also note that when , the ASRs of ADMM attack for average case and worst case are higher than the ASRs of C&W attack, demonstrating higher transferability of the ADMM attack.

Figure 3. transferiablity evaluation of C&W and ADMM attacks on MNIST

6. Conclusion

In this paper, we propose an ADMM-based general framework for adversarial attacks. Under the ADMM framework, , , and attacks are proposed and implemented. We compare the ADMM attacks with state-of-the-art adversarial attacks, showing ADMM attacks are so far the strongest. The ADMM attack is also applied to break two defense methods, the defensive distillation and adversarial training. Experimental results show the effectiveness of the proposed ADMM attacks with strong transferability.

References