A principled approach for generating adversarial images under non-smooth dissimilarity metrics

08/05/2019
by   Aram-Alexandre Pooladian, et al.
McGill University
0

Deep neural networks are vulnerable to adversarial perturbations: small changes in the input easily lead to misclassification. In this work, we propose an attack methodology catered not only for cases where the perturbations are measured by ℓ_p norms, but in fact any adversarial dissimilarity metric with a closed proximal form. This includes, but is not limited to, ℓ_1, ℓ_2, ℓ_∞ perturbations, and the ℓ_0 counting "norm", i.e. true sparseness. Our approach to generating perturbations is a natural extension of our recent work, the LogBarrier attack, which previously required the metric to be differentiable. We demonstrate our new algorithm, ProxLogBarrier, on the MNIST, CIFAR10, and ImageNet-1k datasets. We attack undefended and defended models, and show that our algorithm transfers to various datasets with little parameter tuning. In particular, in the ℓ_0 case, our algorithm finds significantly smaller perturbations compared to multiple existing methods

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 8

10/24/2017

One pixel attack for fooling deep neural networks

Recent research has revealed that the output of Deep Neural Networks (DN...
12/16/2018

Trust Region Based Adversarial Attack on Neural Networks

Deep Neural Networks are quite vulnerable to adversarial perturbations. ...
06/19/2018

Built-in Vulnerabilities to Imperceptible Adversarial Perturbations

Designing models that are robust to small adversarial perturbations of t...
12/01/2020

Adversarial Robustness Across Representation Spaces

Adversarial robustness corresponds to the susceptibility of deep neural ...
07/18/2018

Harmonic Adversarial Attack Method

Adversarial attacks find perturbations that can fool models into misclas...
07/26/2018

A general metric for identifying adversarial images

It is well known that a determined adversary can fool a neural network b...
02/22/2020

Polarizing Front Ends for Robust CNNs

The vulnerability of deep neural networks to small, adversarially design...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Deep neural networks are vulnerable to adversarial perturbations: “imperceptibly small” (measured via a dissimilarity metric) changes in the model input lead to misclassification [30, 13]. The existence of small norm adversarial attacks can be interpreted to mean that while models generalize well on natural images, and are robust to random perturbations, nevertheless they lose accuracy on worst-case perturbations. The vulnerability of DNNs is a potentially grave security risk. In developing strong perturbations, we hope that better defense mechanisms will be deployed to prevent these attacks from occurring in practice.

Adversarial attacks are often broadly categorized into one of two types: white-box attacks, where the full structure of the neural network is provided to the attacker, including gradient information, or black-box attacks, where the attacker is only given the model decision. One of the first proposed adversarial attacks is the Fast Gradient Signed Method (FGSM), which generates an adversarial image with respect to the norm, along with its iterative form, called Iterative FGSM (IFGSM) [13, 17]. A similar iterative attack was also done with respect to the

norm. In their purest form, the above attacks perform gradient ascent on the loss function subject to a norm constraint on the perturbation, either with one step in the case of FGSM and multiple steps in the case of IFGSM, and the

norm equivalent. Apart from loss maximization, attacks have been developed using loss functions that directly measure misclassification [5, 22]. There is also the problem of generating “realistic” attacks, such as through sparse attacks. These include for example, small stickers on a road sign, which may tamper with autonomous vehicles [10]. In the black-box setting, adversarial examples are generated using only model outputs or model decisions, which is a much more expensive endeavor. However, black-box methods can sometimes perform better, most notably by avoiding gradient obfuscation, and take advantage of sampling properties near the decision boundary. Notable examples of black-box (decision-based) attacks are the Boundary Attack [4] and the recent HopSkipJumpAttack [6].

The development of new and improved adversarial attacks has occurred in parallel with various defensive training regiments to provide robustness against adversarial perturbations. The task of training a robust network is two-fold: models must be resistant to perturbations of a certain magnitude, while also maintaining classification ability on clean data. It has been argued that these two objectives are inherently “at odds

[31]. A popular method for training robust networks is adversarial training, where adversarial examples are added to the training data (see for example [19]). While effective, adversarial training has not scaled well with increasing network size. Two recent methods have tackled this problem for large networks on ImageNet-1k [29, 11]; this was previously not possible without a massive computational infrastructure [32, 15].

Contributions

This paper introduces an attack methodology catered for not just norms, but any adversarial dissimilarity metric with a closed proximal form. This includes, but is not limited to, perturbations and the counting “norm”, i.e. a true measurement of sparseness of the perturbation. Our approach adopts the relaxation structure of the recently proposed LogBarrier attack [12], which required differentiable metrics. We extend this work to include a broad class of non-smooth (non-differentiable) metrics. Our algorithm, ProxLogBarrier, uses the proximal gradient method for generating adversarial perturbations. We demonstrate our attack on MNIST, CIFAR10, and ImageNet-1k datasets. ProxLogBarrier shows significant improvement over both the LogBarrier attack, and over the other attacks we considered. In particular, in the case, we achieve state-of-the-art results with respect to a suite of attacks typically used for this problem class.

2. Problem formulation of adversarial attacks and background

Let be the image space, and be the label space (the unit-simplex for classes). An image-label pair is defined by , with the image belonging to one of classes. The trained model is defined by . An adversarial perturbation is supposed to be small with respect to a dissimilarity metric (henceforth simply called a metric) , e.g. . Formally, the optimal adversarial perturbation is the minimizer of the following constrained optimization problem:

(1)

where is the classification function for the trained network. When using the cross-entropy classification function, problem (1) can be written as

(2)

DNNs might be powerful classifiers, but that does not mean their decision boundaries are well-behaved. Instead, researchers have popularized using the cross-entropy loss as a surrogate for the decision boundary: typically a model is trained until the loss is very low, which is often related to good classification performance. Thus, instead of solving (

1), one can perform Projected Gradient Ascent (PGA) on the cross-entropy loss:

(3)

where is typically taken to be either the or norm, and is defines the perturbation threshold of interest.

Some adversarial attack methods try to solve the problem posed in (1

) without incorporating the loss function used to train the network. For example, Carlini & Wagner attacked the logit-layer of a network and solves a different optimization problem, depending on the choice of norm

[5]

. With regards to adversarial defense, they demonstrated how a significant number of adversarial defense methods fail because of “gradient obfuscation”, where gradients are small only locally to the image. However, this is in fact an artifact of the softmax layer

[1]. Another metric of adversarial dissimilarity is the “norm”, which counts the number of total different pixels between the adversary and the clean image [21, 24]. This is of interest because an adversary might have to also budget the number of allowed pixels to perturb, while still remaining “imperceptible” to the human eye. For example, the sticker-attack [10] is a practical attack with real-world consequences, and does not interfere with every single part of the image.

3. Our method: ProxLogBarrier

We consider the formulation of adversarial attacks as in (2),

(4)

Here, we abbreviate as the model output before the softmax layer that “projects” onto . This problem is difficult as the constraints have virtually no exploitable structure. The problem can be relaxed using a logarithmic barrier, a technique often used in traditional optimization [23],

(5)

This objective function now includes the constraint that enforces misclassification. In [12], (5) was originally solved via gradient descent, which necessarily assumes that is at least differentiable. Most dissimilarity metrics are not differentiable; for example the norm. In the original LogBarrier paper [12], a smooth approximation of this norm was used to get around this issue.

For brevity, let , then optimization problem (5) becomes

(6)

This relaxed problem has a composite structure, with being smooth provided . We turn to the proximal gradient method to efficiently solve this problem and outline this method in the following section. Instead of requiring to be differentiable, we only require a closed proximal form. This assumption is satisfied for most of the dissimilarity metrics considered in the adversarial attack literature.

3.1. Proximal gradient method

Proximal algorithms are a driving force for nonsmooth optimization problems, and are receiving more attention in the deep learning community on a myriad of problems

[2, 34, 20, 25]. For a full discussion on this topic, we suggest [3].

We consider the following framework for proximal algorithms, namely a composite minimization problem

(7)

where is a Euclidean space. We make the following assumptions:

  • is a non-degenerate, closed, and convex function over

  • is non-degenerate, closed function, with convex, and has -Lipschitz gradients over the interior of its domain

  • the solution set, , is non-empty.

Solving this composite problem with gradient descent is not advisable, since is not necessarily differentiable. The best one can hope for is that has a subgradient at , defined as an element such that

(8)

The collection of subgradients of is called the subdifferential of , denoted by . When a function is differentiable, the subdifferential is a singleton, namely . One could turn to subgradient descent to solve the composite problem, however a subgradient might not always be helpful. For example, consider the subgradient of for an element in ;

where , and

are the standard basis vectors. At each subgradient step, very little information is obtained.


Since is a non-convex problem (because is potentially not convex), our goal is to iteratively generate a sequence that converges to , where is a stationary point i.e. . A characterization of these stationary points is the following fixed-point representation (we take ):

where is defined as the proximal operator of

(9)

The first line in the equivalence chain uses addition of subdifferentiability, which is guaranteed by our assumptions, and the rest is algebraic manipulation. Thus, to generate a stationary point, it suffices to find a fixed point of the sequence generated in the following manner:

(10)

where is some step size. The proximal operator exists for any convex function, but this is not a strict requirement.

Despite not being convex, there are still convergence properties of the sequence of iterates generated in this way. The following theorem is a simplified version of what can be found in [3] (Section 10.3 with proof), and is the main motivation for our proposed method.

Theorem 1.

Given the assumptions on (7), let be the sequence generated by (10), with fixed step size . Then,

  1. [(a)]

  2. the sequence is non-increasing. In addition, if and only if is not a stationary point of (7);

  3. as ;

  4. all limit points of the sequence are stationary points of (7).

3.2. ProxLogBarrier attack algorithm

We iteratively find a minimizer for (6); the attack is outlined in Algorithm 1. Due to the highly non-convex nature of the decision boundary, we perform a backtracking step to ensure the proposed iterate is in fact adversarial, thus making smooth. We remark that the adversarial attack problem is constrained by the image-space, and thus requires a further projection step back onto the image space (we consider pixels to be in the range [0,1]). In traditional non-convex optimization, best practice is to also record the “best iterate”, as valleys are likely pervasive throughout the decision boundary. This way, even at some point our gradient sends the iterate far-off and is unable to return in the remaining iterations, we already have a better candidate.

  Input: image-label pair , trained model , adversarial dissimilarity metric

  Intialize hyperparameters:

, and .
  Initialize to be misclassified,
  for  do
     Every iterations:
     
     
     Backtrack along line between current and previous iterate until misclassified
     if  then
        
     else
        
     end if
  end for
  Output:
Algorithm 1 ProxLogBarrier (PLB)

Proximal operators for metrics of interest

To complete the algorithm, it remains to compute the proximal operator for various choices of . One can turn to [3] for complete derivations of the proximal operators for the adversarial metrics we are considering, namely norms, and the cardinality function. Consider measuring the distance between the clean image and our desired adversarial perturbation:

Due to the Moreau Decomposition Theorem [27], the proximal operator of this function relies on projecting onto the unit ball:

We make use of the algorithm from [9] to perform the projection step, implemented over batches of vectors for efficiency. Similarly, one obtains the proximal operator for and via the same theorem,

where is the well-known soft thresholding operator. In the case that one wants to minimize the number of perturbed pixels in the adversarial image, one can turn to the counting “norm”, called , which counts the number of non-zero entries in a vector. While this function is non-convex, the proximal operator still has a closed form:

where is a hard-thresholding operator, and acts component-wise in the case of vector arguments.

4. Experimental methodology

Outline

We compare the ProxLogBarrier attack with several other adversarial attacks on MNIST [18], CIFAR10 [16], and ImageNet-1k [8] datasets. For MNIST, we use the network described in [24]; on CIFAR10, we use a ResNeXt network [33]; and for ImageNet-1k, ResNet50 [14, 7]. We also consider defended models for the aforementioned networks. This is to further benchmark the attack capability of the ProxLogBarrier, and to reaffirm previous work in the area. For defended models, we consider Madry-style AT for CIFAR10 and MNIST [19]. On ImageNet-1k, we use the recently proposed scaleable input gradient regularization for adversarial robustness [11]. We randomly select 1000 (test) images to evaluate performance on MNIST, CIFAR10, and 500 (test) images on ImageNet-1k. We consider the same images on their defended counterparts. We note that for ImageNet-1k, we consider the problem of Top5 misclassification, where the logrithmic barrier is with respect to the following constraint set

where denotes the largest index.

We compare the ProxLogBarrier attack with a wide range of attack algorithms that are available through the FoolBox adversarial attack library [26]. For perturbations in , we compare against SparseFool [21], Jacobian Saliency Map Attack (JSMA) [24], and Pointwise [28] (this latter attack is black-box). For attacks, we consider Carlini-Wagner’s attack (CW) [5], Projected Gradient Ascent (PGA) [17], DeepFool [22], and the original LogBarrier attack [12]. Finally, for norm perturbations, we consider PGA, DeepFool, and LogBarrier. All hyperparameters are left to their implementation defaults, with the exception of SparseFool, where we used the exact parameters indicated in the paper.

Implementation details for our algorithm

When optimizing for based noise, we initialize the adversarial image with sufficiently large Gaussian noise; for and based perturbations, we use uniform noise. For hyper-parameter defaults, we recommend , with , which transfer well across all datasets and are the same parameters used in this paper. We observed some computational drawbacks for ImageNet-1k: firstly, the proximal operator for the norm is far too strict. We decided to use the norm to induce sparseness in our adversarial perturbation (changing both the prox parameter and the step size to ). Other parameter changes for the ImageNet-1k dataset is the proximal parameter in the case, we set and we used 2500 algorithm iterations. Finally, we found that using the softmax layer outputs helps with ImageNet-1k attacks against both the defended and undefended network.

Reporting

For perturbations in and , we report the percent misclassification at various threshold levels that are somewhat standard [31]. Our choices for distance thresholds were arbitrary, however we supplement with a median perturbation distances for all attack norms to mitigate cherry-picking. For attacks that were unable to successfully perturb at least half the sampled images, we do not report anything. If the attack was able to perturb more than half but not all, we add an asterisk to the median distance. We denote the defended models by “(D)” (recall that for MNIST and CIFAR10, we are using Madry’s adversarial training, and scaleable input-gradient regularization for Imagenet-1k).

4.1. Results

Perturbations in :

Our results for attacks are found in Table 1, with examples available in Figure 1 and Figure 2. Across all datasets considered, ProxLogBarrier outperforms all other attack methods, for both defended and undefended networks. It also seems undeterred from both Madry-style adversarial training on MNIST and CIFAR10. This is entirely reasonable, for the Madry-style adversarial training is targeted towards attacks. In contrast, on ImageNet-1k, the defended model trained with input-gradient regularization performs significantly better than the undefended model, even though this defence is not aimed towards attacks. Neither JSMA or Pointwise scale to networks on ImageNet-1k. Pointwise exceeds at smaller images, since it takes less than 1000 iterations to cycle over every pixel and check if it can be zeroed out. We remark that SparseFool was unable to adversarially attack all images, whereas ProxLogBarrier always succeeded.

width= MNIST CIFAR10 ImageNet % error at median distance % error at median distance % error at median distance PLB 86.30 100 6 44.10 68.50 39 66.00 80.20 268 SparseFool 46.00 99.40 11 15.60 22.60 3071 30.40 46.80 JSMA 12.73 61.38 25 29.56 48.92 84 Pointwise 5.00 57.30 28 13.20 50.60 80 (D) PLB 79.8 98.90 6 74.90 97.80 13 38.40 70.0 691 (D) SparseFool 20.67 75.45 20 34.23 52.15 70 24.80 41.80 (D) JSMA 12.63 44.51 34 36.65 60.79 53 (D) Pointwise 12.50 65.80 24 23.80 43.10 102

Table 1. Adversarial robustness statistics, measured in the norm.
(a) attacks on MNIST
(b) attacks on CIFAR10
Figure 1. Adversarial images for perturbations, generated by our method.
Figure 2. Examples of adversarial perturbations on ProxLogBarrier in the adversarial metric on ImageNet-1K, where these perturbations have a pixel perturbation count of under 1000 out of 178608 total pixels.

Perturbations in :

Results for perturbations are found in Table 2. Our attack stands out on MNIST, in both the defended and undefended case. On CIFAR10, our attack is best on the undefended network, and only slightly worse than PGA when adversarially defended. On ImageNet-1k, our method suffers dramatically. This is likely due to very poor decision boundaries on the with respect to this norm , as our method will necessarily be better when the boundaries are not muddled. PGA does not focus on the decision boundaries explicitly, thus has more room to find something adversarial quickly.

width= MNIST CIFAR10 ImageNet % error at median distance % error at median distance % error at median distance PLB 10.30 100 95.00 98.60 20.40 33.80 PGA 10.70 80.90 54.70 87.00 90.80 98.60 DeepFool 8.12 86.55 16.23 51.00 93.64 100 LogBarrier 5.89 73.90 60.60 93.10 7.60 7.70 (D) PLB 3.0 32.9 23.3 44.1 11.40 18.80 (D) PGA 2.8 23.6 22.9 46.1 49.20 96.60 (D) DeepFool 2.7 10.2 23.8 44.1 43.20 97.40 (D) LogBarrier 2.50 11.89 17.6 28.3 9.80 10.40

Table 2. Adversarial robustness statistics, measured in the norm.

Perturbations in :

Results for perturbations measured in Euclidean distance are found in Table 3. For MNIST and ImageNet-1k, on both defended and undefended networks, our attack performs better than all other methods, both in median distance and at a given perturbation norm threshold. On CIFAR10, we are best on undefended but lose to CW in the defended case. However, the CW attack did not scale to ImageNet-1k using the implementation in the FoolBox attack library.

width= MNIST CIFAR10 ImageNet % error at median distance % error at median distance % error at median distance PLB 38.60 99.40 1.35 97.70 99.80 47.60 89.40 CW 35.10 98.30 1.41 89.94 95.97 20.06 44.26 1.16 PGA 24.70 70.00 1.70 60.60 73.30 37.60 70.60 DeepFool 13.21 48.04 2.35 17.33 22.04 1.11 40.08 76.48 LogBarrier 37.40 98.90 1.35 69.60 84.00 43.70 88.30 (D) PLB 29.50 92.90 1.54 28.7 35.4 15.80 28.20 1.74 (D) CW 28.24 78.59 1.72 29.6 38.7 (D) PGA 17.20 45.70 2.44 28.30 34.70 14.60 22.60 2.20 (D) DeepFool 5.22 18.07 3.73 28.0 33.3 15.60 24.40 2.14 (D) LogBarrier 25.00 89.60 1.65 28.0 34.6 10.00 10.20 63.17

Table 3. Adversarial robustness statistics, measured in the norm.

Algorithm runtime:

We strove to implement ProxLogBarrier so that it could be run in a reasonable amount of time. For that reason, ProxLogBarrier was implemented to work over a batch of images. Our code is publicly available at the following https://github.com/APooladian/ProxLogBarrierAttack. Using one consumer grade GPU, we can comfortably attack several MNIST and CIFAR10 images simultaneously, but only one ImageNet-1k image at a given time. We report our algorithm runtimes in Table 4

, that achieve the statistics mentioned in the previous tables. Most of the other algorithms were implemented taken from the FoolBox repository, and were not written to take advantage of the GPU. Hence we omit a comparison with the other attack algorithms based on run time. Heuristically speaking, PGA is one of the faster algorithms, whereas CW, SparseFool, and DeepFool are slower. We are not surprised that our attack in

takes longer than the other norms; this is likely due to the backtracking step to ensure misclassification of the iterate. On ImageNet-1k, the ProxLogBarrier attack in the metric is quite slow due to the projection step onto the ball, which is , where is the input dimension size [9].

Batch Size
MNIST 100 8.35 6.91 6.05
CIFAR10 25 69.07 56.11 30.87
ImageNet-1k 1 35.45 29.47 75.50
Table 4. ProxLogBarrier attack runtimes (in seconds)

5. Conclusion

We have presented a concise framework for generating adversarial perturbations by incorporating the proximal gradient method. We have expanded upon the LogBarrier attack, which was originally only effective in and norms, by addressing the norm case as well. Thus we have proposed a method unifying all three common perturbation scenarios. Our approach requires fewer hyperparameter tweaks than LogBarrier, and performs significantly better than many attack methods we compared against, both on defended and undefended models, and across all norm choices. We highlight that our method is, to our knowledge, the best choice for perturbations measured in , compared to all other methods available in FoolBox. We also perform better than all other attacks considered on the MNIST network with respect to median distance and commonly reported thresholds. While our paper focuses on adversarial metrics, it is worth noting that the proximal gradient method can open doors to potentially new adversarial metrics, provided they have closed proximal form.

References