Image Super-Resolution as a Defense Against Adversarial Attacks

01/07/2019 ∙ by Aamir Mustafa, et al. ∙ 10

Convolutional Neural Networks have achieved significant success across multiple computer vision tasks. However, they are vulnerable to carefully crafted, human imperceptible adversarial noise patterns which constrain their deployment in critical security-sensitive systems. This paper proposes a computationally efficient image enhancement approach that provides a strong defense mechanism to effectively mitigate the effect of such adversarial perturbations. We show that the deep image restoration networks learn mapping functions that can bring off-the-manifold adversarial samples onto the natural image manifold, thus restoring classifier beliefs towards correct classes. A distinguishing feature of our approach is that, in addition to providing robustness against attacks, it simultaneously enhances image quality and retains models performance on clean images. Furthermore, the proposed method does not modify the classifier or requires a separate mechanism to detect adversarial images. The effectiveness of the scheme has been demonstrated through extensive experiments, where it has proven a strong defense in both white-box and black-box attack settings. The proposed scheme is simple and has the following advantages: (1) it does not require any model training or parameter optimization, (2) it complements other existing defense mechanisms, (3) it is agnostic to the attacked model and attack type and (4) it provides superior performance across all popular attack algorithms. Our codes are publicly available at https://github.com/aamir-mustafa/super-resolution-adversarial-defense.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 4

page 5

page 11

page 12

Code Repositories

super-resolution-adversarial-defense

Image Super-Resolution as a Defense Against Adversarial Attacks


view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Success of Convolutional Neural Networks (CNNs) over the past several years has lead to their extensive deployment in a wide range of computer vision tasks, including image classification [1, 2, 3], object detection [4, 5], semantic segmentation[6, 7] and visual question answering[8]. Not only limited to that, CNNs now play a pivotal role in designing many critical real world systems including, self driving cars[9] and models for disease diagnosis[10], which necessitates their robustness in such situations. Recent works [11, 12, 13] however have shown that CNNs can easily be fooled by distorting natural images with small, well-crafted, human-imperceptible additive perturbations. These distorted images, known as adversarial examples, have further been shown to be transferable across different architectures, e.g an adversarial example generated for an Inception v-3 model is able to fool other CNN architectures [11, 14].

Fig. 1:

a) A 3D plot showing adversarial images’ features (red) and corresponding clean images’ features (green). b) On the right, we show the features of corresponding defended images (blue). The plot clearly shows that the super-resolution operation remaps the adversarial images to natural image manifold which otherwise lie off manifold. (randomly selected 100 features projected to 3D space using principal component analysis are shown for better visualization)

Owing to the critical nature of security-sensitive CNN applications, significant research has been carried out to devise defence mechanisms against these vulnerabilities [15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]. We can broadly categorize these defenses along two directions: the first being model-specific mechanisms, which aim to regularize a specific model’s parameters through adversarial training or parameter smoothing [18, 17, 15, 26, 24]. Such methods often require differentiable transformations that are computationally demanding. Moreover these transformations are vulnerable to further attacks, as the adversaries can circumvent them by exploiting the differentiable modules. The second category of defenses are model-agnostic. They mitigate the effect of adversarial perturbations in the input image domain by applying various transformations. Examples of such techniques include JPEG compression [27, 28], foveation-based methods, which crop the image background [29], random pixel deflection [23]

and random image padding & re-sizing

[16]. Compared with differentiable model-specific methods, most of the model-agnostic approaches are computationally faster and carry out transformations in the input domain, making them more favorable. However, most of these approaches lose critical image content when removing adversarial noise, which results in poor classification performance on non-attacked images.

This paper proposes a model-agnostic defense mechanism against a wide range of recently proposed adversarial attacks [12, 13, 30, 31, 32, 33] and does not suffer from information loss. Our proposed defense is based upon image super-resolution (SR), which selectively adds high frequency components to an image and removes noisy perturbations added by the adversary. We hypothesize that the learned SR models are generic enough to remap off-the-manifold samples on to the natural image manifold (see Fig. 1). The effect of added noise is further suppressed by wavelet domain filtering and inherently minimized through a global pooling operation on the higher resolution version of the image. The proposed image super-resolution and wavelet filtering based defense results in a joint non-differentiable module, which can efficiently recover the original class labels for adversarially perturbed images.

The main contributions of our work are:

  1. [leftmargin=0.25in]

  2. Through extensive empirical evaluations, we show image super-resolution to be an effective defense strategy against a wide range of recently proposed state-of-the-art attacks in the literature [12, 13, 30, 31, 32, 33]. Using Class Activation Map visualizations, we demonstrate that super-resolution can successfully divert the attention of the classifier from random noisy patches to more distinctive regions of the attacked images (see Fig. 5 and  6).

  3. Super-resolving an adversarial image projects it back to the natural image manifold learned by deep image classification networks.

  4. Unlike existing image transformation based techniques, which introduce artifacts in the process of overcoming adversarial noise, the proposed scheme retains critical image content, and thus minimally impacts the classifier’s performance on clean, non-attacked images.

  5. The proposed defense mechanism tackles adversarial attacks with no knowledge of the target model’s architecture or parameters. This can easily complement other existing model-specific defense methods.

Closely related to our approach are the Defense-GAN [34] and MagNet [22]

that first estimate the manifold of clean data to detect adversarial examples and then apply a mapping function to reduce adversarial noise. Since they use generator blocks to re-create images, their studied case is restricted to small datasets (CIFAR-10, MNIST) with low-resolution images. In contrast, our approach does not require any prior detection scheme and works for all types of natural images with a much generic mapping function.

Below, we first formally define the underlying problem (Sec. 2.1), followed by a brief description of existing adversarial attacks (Sec. 2.2) and defenses (Sec. 2.3). We then present our proposed defense mechanism (Sec. 3). The effectiveness of our proposed defense is then demonstrated through extensive experiments against state-of-the art adversarial attacks [12, 13, 30, 31, 32, 33] and comparison with other recently proposed model-agnostic defenses [35, 16, 36, 23] (see Section 4).

2 Background

Here we introduce popular adversarial attacks and defenses proposed in the literature, which form the basis of our evaluations and are necessary for understanding our proposed defense mechanism. We only focus on adversarial examples in the domain of image classification, though the same can be crafted for various other computer vision tasks as well.

2.1 Problem Definition

Let denote a clean image sample and its corresponding ground-truth label, where the subscript emphasizes that the image is clean. Untargeted attacks aim to mis-classify a correctly classified example to any incorrect category. For these attacks, for a given image classifier , an additive perturbation is computed under the constraint that the generated adversarial example looks visually similar to the clean image i.e., for some dissimilarity function and the corresponding labels are unequal i.e . Targeted attacks change the correct label to a specified incorrect label i.e., they seek such that where is a specific class label such that . An attack is considered successful for an image sample if it can find its corresponding adversarial example under the given set of constraints. In practice is norm between a clean image and its corresponding adversarial example, where .

2.2 Adversarial Attacks

(a) Fast Gradient Sign Method (FGSM): This is one of the first attack methods, introduced by Goodfellow et al. [12]

. Given a loss function

, where denotes the network parameters, the goal is to maximize the loss as:

(1)

FGSM is a single step attack which aims to find the adversarial perturbations by moving in the opposite direction to the gradient of the loss function w.r.t. the image ():

(2)

Here is the step size, which essentially restricts the norm of the perturbation.

(b) Iterative Fast Gradient Sign Method (I-FGSM) is an iterative variant of FGSM, introduced by Kurakin et al.[13]. I-FGSM performs the update as follows:

(3)

where , and after iterations, .

(c) Momentum Iterative Fast Gradient Sign Method (MI-FGSM) proposed by Dong et al. [30] is similar to I-FGSM with introduction of an additional momentum term which stabilizes the direction of gradient and helps in escaping local maxima. MI-FGSM is defined as follows:

(4)
(5)

where is the decay factor, and after iterations.

(d) DeepFool was proposed by Moosavi-Dezfooli et al.[31] and aims to minimize the norm between a given image and its adversarial counterpart. The attack assumes that a given image resides in a specific class region surrounded by the decision boundaries of the classifier. The algorithm then iteratively projects the image across the decision boundaries, which is of the form of a polyhydron, until the image crosses the boundary and is mis-classified.

(e) Carlini and Wagner (C&W)[32] is a strong iterative attack that minimizes an auxiliary variable as follows:

(6)

where is the perturbation and is defined as

(7)

Here

are the logits values corresponding to a class

and is the margin parameter. C&W attack works for various norms.

(f) DIFGSM and MDIFGSM [33]: The aforementioned attacks can be grouped into: single-step and iterative attacks. Iterative attacks have higher success rate under white-box conditions, but they tend to overfit, and generalize poorly across black-box settings. On the contrary, single-step attacks generate perturbed images with fairly improved transferability but less success rate in white-box conditions. Recently proposed Diverse-Input-Iterative-FGSM (DIFGSM) and Momentum-Diverse-Input-Iterative-FGSM (MDIFGSM) [33] methods claim to fill in this gap and improve the transferabiltiy of iterative attacks. DIFGSM performs random image re-sizing and padding as image transformation , thus creating an augmented set of images, which are then attacked using I-FGSM as:

(8)

Here is the ratio of transformed images to total number of images in the augmented dataset. MDIFGSM is a variant, which incorporates the momentum term in DIFGSM to stabilize the direction of gradients. The overall update for MDIFGSM is similar to MI-FGSM with Equation 4 being replaced by:

(9)
Fig. 2: The figure illustrates mapping of a sample image from low resolution to its high resolution manifold. Adversarial images, which otherwise lie off the manifold, are mapped in the same domain as the clean natural images, thereby recovering their corresponding true labels.
Fig. 3:

Effect of super resolution on the frequency distribution of a sample image. Magnitude spectrum for each image is generated using discrete fourier transform (DCT). After removing low-frequency components from the image spectrum (i.e. high pass filtering) inverse DCT is used to visualize the high-frequency components. IDCT of recovered image shows selective high-frequency components that are added by single image super-resolution. The adversarial perturbations were produced using MDI

FGSM with a maximum perturbation size of 16.

2.3 Adversarial Defenses

Tremer et al.[18]

proposed ensemble adversarial training, which results in regularizing the network by softening the decision boundaries, thereby encompassing nearby adversarial images. Defensive distillation

[15] improves the model robustness essentially in a similar fashion by retraining a given model using soft labels acquired by distillation mechanism [37]. Kurakin et al. [17] augmented a training batch of clean images with corresponding adversarial images to improve robustness. Moosavi-Dezfooli et al.[38], however, showed that adversarial examples can also be generated for an already adversarially trained model.

Recently, some defense methods have been proposed in input image transformation domain. Data compression (JPEG image compression) as a defense was studied by [27, 28]. JPEG compression deploys discrete cosine transform to suppress the human imperceptible high frequency noisy components. Guo et al. [36]

, however, noted that JPEG compression alone is far from being an effective defense. They proposed image transformations using quilting and Total Variance Minimization (TVM). Feature squeezing

[39] reduces the image resolution either by using bit depth reduction or smoothing filters to limit the adversarial space. A foveation based method was proposed by Luo et al.[29] which shows robustness against weak attacks like L-BFGS[11] and FGSM[12]. Other closely related work to ours is that of Prakash et al.[23], which deflects attention by carefully corrupting less critical image pixels. This introduces new artifacts which reduce the image quality and can result in mis-classification. To handle such artifacts, BayesShrink denoising in the wavelet domain is used. It has been shown that denoising in the wavelet domain yields superior performance than other techniques such as bilateral, an-isotropic, TVM and Wiener-Hunt de-convolution [23]. Another closely related work is that of Xie et al. [16], which performs image transformations by randomly re-sizing and padding an image before passing it through a CNN classifier. Xie et al. [25] showed that adding adversarial patterns to a clean image results in noisy activation maps. A defense mechanism was proposed to perform feature denoising using non-local means, which requires retraining the model end-to-end with adversarial data augmentation. One of the main shortcomings of the aforementioned defense techniques (JPEG compression, PD and foveation based method) is that the transformations degrade the image quality, which results in loss of significant information from images.

3 Proposed Perturbed Image Restoration

Existing defense mechanisms against adversarial attacks aim to reduce the effects of added perturbations so as to recover the correct image class. Defenses are being developed along two main directions: (i) modifying the image classifier to such that it recovers the true label for an adversarial example i.e. ; and (ii) transforming the input image such that , where is an image transformation function. Ideally, should be model-agnostic, complex and a non-differentiable function, making it harder for the adversary to circumvent the transformed model by back-propagating the classifier error through it.

Our proposed approach, detailed below, falls under the second category of defense mechanisms. We propose to use image restoration techniques to purify perturbed images. The proposed approach has two components, which together form a non-differentiable pipeline that is difficult to bypass. As an initial step, we apply wavelet denoising to suppress any noise patterns. The central component of our approach is the super-resolution operation, which enhances the pixel resolution while simultaneously removing adversarial patterns. Our experiments show that image super-resolution alone is sufficient to reinstate classifier beliefs towards correct categories; however, the second step provides added robustness since it is a non-differentiable denoising operation.

In the following section, we first explain the super-resolution approach (Sec. 3.1) followed by a description of denoising method (Sec. 3.2). Finally, we summarize the defense scheme in Sec. 3.3.

Fig. 4: Feature map in the res

block of an ImageNet-trained ResNet-50 for a clean image, its adversarial counterpart and the recovered image. The adversarial perturbation was produced using FGSM with

. Image super-resolution essentially nullifies the effect of adversarial patterns added by the adversary.

3.1 Super Resolution as a Defense Mechanism

Our goal is to defend a classification model against the perturbed images generated by an adversary. Our approach is motivated by the manifold assumption [40]

, which postulates that natural images lie on low-dimensional manifolds. This explains why low-dimensional deep feature representations can accurately capture the structure of real datasets. The perturbed images are known to lie off the low-dimensional manifold of natural images, which is approximated by deep networks

[41]. Gong et al. in [42] showed that a simple binary classifier can successfully separate of-the-manifold adversarial images from clean ones and thereby concluded that adversarial and clean data are not twins, despite appearing visually identical. Fig. 2 shows a low-dimensional manifold of natural images. Data points from a real-world dataset (say ImageNet) are sampled from a distribution of natural images and can be considered to lie on-the-manifold. Such images are referred to as in-domain [43]. Corrupting these in-domain images by adding adversarial noise takes the images off-the-manifold. A model that learns to yield images lying on-the-manifold from off-the-manifold images can go a long way in detecting and defending against adversarial attacks. We propose to use image super-resolution as a mapping function to remap off-the-manifold adversarial samples on to the natural image manifold and validate our proposal through experimentation (see Sec. 4.1). In this manner, robustness against adversarial perturbations is achieved by enhancing the visual quality of images. This approach provides remarkable benefits over other defense mechanisms that truncate critical information to achieve robustness.

Super-resolution Network: A required characteristic for defense mechanisms is the ability to destroy fraudulent perturbations added by an adversary. Since these perturbations are generally high-frequency details, we use a super-resolution network that explicitly uses residual learning to focus on such details. These details are added to the low-resolution inputs in each residual block to eventually generate a high-quality, super-resolved image. The network considered in this work is the Enhanced Deep Super-Resolution (EDSR) [44] network, which uses a hierarchy of such residual blocks. While our proposed approach achieves competitive performance with other super-resolution and up-sampling techniques, we demonstrate the added efficacy of using residual learning based EDSR model through extensive experiments.

Effect on Spectral Distribution: The underlying assumption of our method is that the deep super-resolution networks learn a mapping function that is generic enough to map the perturbed image onto the manifold of its corresponding class image. This mapping function learned with deep CNNs basically models the distribution of real non-perturbed image data. We validate this assumption by analyzing the frequency-domain spectrum of the clean, adversarial and recovered images in Fig. 3. It can be observed that adversarial image contains high frequency patterns and the super-resolution operation further injects high frequency patterns to the recovered image. This achieves two major benefits: first, the newly added high-frequency patterns smooth the frequency response of the image (column 5, Fig. 3) and, second, the super-resolution destroys the adversarial patterns that seek to fool the model.

Effect of adversarial perturbations on feature maps: Adversarial attacks add small perturbations to images, which are often imperceptible to the human eye or generally perceived as small noise in an image in the pixel space. However this adversarial noise amplifies in the feature maps of a convolutional network, leading to substantial noise [25]. Fig. 4 shows the feature maps for three clean images, their adversarial counterparts and the defended images chosen from ResNet-50 res block after the activation layer. Each feature map is of dimensions. The features for a clean image sample are activated only at semantically significant regions of the image, whereas those for its adversarial counterpart seem to be focused at semantically irrelevant regions as well. Xie et al [25] performed feature denoising using non-local means [45] to improve the robustness of convolutional networks. Their model is trained end-to-end on adversarially perturbed images. Our defense technique recovers the feature maps (Cols 2 and 4, Fig. 4) without requiring any model retraining or adversarial image data augmentation.

Advantages of proposed method: Our proposed method offers a number of advantages. (a) The proposed approach is agnostic to the attack algorithm and the attacked model. (b) Unlike many recently proposed techniques, which degrade critical image information as part of their defense, our proposed method improves image quality while simultaneously providing a strong defense. (c) The proposed method does not require any learning and only uses a fixed set of parameters to purify input images. (d) It does not hamper the classifier’s performance on clean images. (e) Due to its modular nature, the proposed approach can be used as a pre-processing step in existing deep networks. Furthermore, our purification approach is equally applicable to other computer vision tasks beyond classification such as segmentation and object detection.

3.2 Wavelet Denoising

Since all adversarial attacks add noise to an image in the form of well-crafted perturbations, an efficient image denoising technique can go a long way in mitigating the effect of these perturbations if not remove them altogether. Image denoising in the spatial or frequency domain causes loss of textural details which is detrimental to our goal of achieving clean image-like performance on denoised images. Denosing in the wavelet domain has gained popularity in recent works. It yields better results than various other techniques including, bilateral, anisotropic, Total Variance Minimization (TVM) and Wiener-Hunt de-convolution [23]. The main principle behind wavelet shrinkage is that Discrete Wavelet Transform (DWT) of real world signals are sparse in nature. This can be exploited to our advantage since the ImageNet dataset [46] contains images that capture real world scenes and objects. Consider an adversarial example , the wavelet transform of is a linear combination of the wavelet transform of the clean image and noise. Unlike image smoothing, which removes the higher frequency components in an image, DWT of real world images have large coefficients corresponding to significant image features and noise can be removed by applying a threshold on the smaller coefficients.

3.2.1 Thresholding

Thresholding parameter determines how efficiently we shrink the wavelet coefficients and remove adversarial noise from an image. In practice, two types of thresholding methods are used: a) Hard thresholding and b) Soft thresholding. Hard thresholding is basically a non-linear technique, where each coefficient is individually compared to a threshold value , as follows:

Reducing the small noisy coefficients to zero and then carrying out an inverse wavelet transform produces an image which retains critical information and suppresses the noise. Unlike hard-thresholding where the coefficients larger than are fully retained, soft-thresholding modifies the coefficients as follows:

In our method, we use soft-thresholding as it reduces abrupt sharp changes that otherwise occur in hard-thresholding. Also hard-thresholding over-smooths an image which reduces the classification accuracy on clean non-adversarial images.

Choosing an optimal threshold value is the underlying challenge in wavelet denoising. A very large threshold value means ignoring larger wavelets which results in an over smoothed image. In contrast, a small threshold allows even the noisy wavelets to pass thus failing to produce a denoised image after reconstruction. Universal thresholding is employed in VisuShrink [47] to determine the threshold parameter for an image with pixels as , where is an estimate of the noise level. BayesShrink [48] is an efficient method for wavelet shrinkage which employs different thresholds for each wavelet sub-band by considering Gaussian noise. Suppose is the wavelet transform of an adversarial image, since and are mutually independent, the variances , and of , , , respectively, follow: . A wavelet sub-band variance for an adversarial image is estimated as:

where are the sub-band wavelets and is the total number of wavelet coefficients in a sub-band. Threshold value for BayesShrink soft-thresholding is given as:

In our experiments, we explored both VisuShrink and BayesShrink soft-thresholding and find the latter to perform better and provide visually superior denoising.

3.3 Algorithmic Description

An algorithmic description of our end-to-end defense scheme is provided in Algorithm 1. We first smooth the effect of adversarial noise using soft wavelet denoising. This is followed by employing super resolution as a mapping function to enhance the visual quality of images. Super resolving an image maps the adversarial examples to natural image manifold in high resolution space which otherwise lies off-the-manifold in low resolution space. The recovered image is then passed through the same pre-trained models on which the adversarial examples were generated. As can be seen, our model-agnostic image transformation technique is aimed at minimizing the effect of adversarial perturbations in the image domain, with little performance loss on clean images. Our technique causes minimal depreciation in classification accuracy of non-adversarial images.

/* Image Denoising */
Input: Corrupted image
Output: Denoised image
1 Convert the RGB image to color space, where and represent luminance and chrominance respectively.
2 Convert the image to wavelet domain using discrete wavelet transform.
3 Remove noisy wavelet coefficients using BayesShrink soft-thresholding.
4 Invert the shrunken wavelet coefficients using Inverse Wavelet Transform (IWT).
5 Revert the image back to RGB.
/* Image Super-Resolution */
Input: Denoised image
Output: Super Resolved Image
6 Map adversarial samples back to natural image manifold using deep super resolution network: .
7 Forward the recovered images to the attacked model for correct prediction.
Algorithm 1 Defending Against Adversarial Attacks with Image Restoration (Wavelet Denoising + Super Resolution)
Model Clean Images FGSM-2 FGSM-5 FGSM-10 I-FGSM DeepFool C&W MI-FGSM DIFGSM MDIFGSM
No Defense
Inception v-3 100 31.7 28.7 30.5 11.4 0.4 0.8 1.7 1.4 0.6
ResNet-50 100 12.2 7.0 6.1 13.4 1.0 0.1 0.4 0.3 0.2
Inception ResNet v-2 100 59.4 55.0 53.6 21.6 0.1 0.3 0.5 1.5 0.6
JPEG Compression (Das et al. [35])
Inception v-3 96.0 62.3 54.7 48.8 77.5 81.2 80.5 69.4 2.1 1.3
ResNet-50 92.8 57.6 49.0 42.9 74.8 77.3 81.3 70.8 0.7 0.4
Inception ResNet v-2 95.5 67.0 55.3 53.7 81.3 83.9 83.1 72.8 1.6 1.1
Random resizing + zero padding (Xie et al. [16])
Inception v-3 97.3 69.2 57.3 53.2 90.6 88.9 89.5 89.5 7.0 5.8
ResNet-50 92.5 66.8 55.7 48.8 88.2 90.9 87.5 88.0 6.6 4.2
Inception ResNet v-2 98.7 70.7 59.1 55.8 87.5 89.7 88.0 88.3 7.5 5.3
Quilting + Total Variance Minimization (Guo et al. [36])
Inception v-3 96.2 70.2 62.0 54.6 85.7 85.9 85.3 84.5 4.1 1.7
ResNet-50 93.1 69.7 61.0 53.3 85.4 85.0 84.6 83.8 3.6 1.1
Inception ResNet v-2 95.6 74.6 67.3 59.0 86.5 86.2 85.3 84.8 4.5 1.2
Pixel Deflection (Prakash et al. [23])
Inception v-3 91.9 71.1 66.7 58.9 90.9 88.1 90.4 90.1 57.6 21.9
ResNet-50 92.7 84.6 77.0 66.8 91.2 90.3 91.7 89.6 57.0 29.5
Inception ResNet v-2 92.1 78.2 75.7 71.6 91.3 88.9 89.7 89.8 57.9 24.6
Our work: Wavelet Denoising + Image Super Resolution
Inception v-3 97.0 94.2 87.9 79.7 96.2 96.1 96.0 95.9 67.9 31.7
ResNet-50 93.9 86.1 77.2 64.9 92.3 91.5 93.1 92.0 60.7 31.9
Inception ResNet v-2 98.2 95.3 87.4 82.3 95.8 96.0 95.6 95.0 69.8 35.6
TABLE I: Performance comparison with state-of-the art defense mechanisms on 5000 images from ILSVRC validation set. The images are selected such that the respective classifier achieves accuracy. Our proposed defense consistently achieves superior performance across three different models and various adversarial attacks.

4 Experiments

Models and Datasets:

We evaluate our proposed defense and compare it with existing methods for three different classifiers: Inception-v3, ResNet-50 and InceptionResNet v-2. For these models, we obtain ImageNet pre-trained weights from TensorFlow’s Github repository

111https://github.com/tensorflow/models/tree/master/research/slim, and do not perform any re-training or fine-tuning. The evaluations are done on a subset of 5000 images from ILSVRC [46] validation set. The images are selected such that the respective model achieves a top-1 accuracy of on the clean non-attacked images. Evaluating defense mechanisms on already mis-classified images is not meaningful, since an attack on a mis-classified image is considered successful as per the definition. We also perform experiments on NIPS 2017 Competition on Adversarial Attacks and Defenses DEV dataset [49]. The dataset is collected by Google Brain organizers, and consists of images of size . An ImageNet pre-trained Inception v-3 model achieves top-1 accuracy on NIPS 2017 DEV images.

Attacks: We generate attacked images using different techniques, including Fast Gradient Sign Method FGSM [12], iterative FGSM (I-FGSM) [13], Momentum Iterative FGSM (MI-FGSM) [30], DeepFool [31], Carlini and Wagner [32], Diverse Input Iterative FGSM (DIFGSM) and Momentum Diverse Input Iterative FGSM (MDIFGSM) [33]. We use publicly available implementations of these methods: Cleverhans[50] 222https://github.com/tensorflow/cleverhans, Foolbox[51] 333https://github.com/bethgelab/foolbox and codes444https://github.com/dongyp13/Non-Targeted-Adversarial-Attacks 555https://github.com/cihangxie/DI-2-FGSM provided by [30, 33]. For FGSM, we generate attacked images with and for iterative attacks, the maximum perturbation size is restricted to . All attacks are carried out in white-box settings, since adversarial attacks are less transferable for larger datasets like ImageNet.

Defenses: We compare our proposed defense with a number of recently introduced state-of-the-art schemes in the literature. These include JPEG Compression [35], Random Resizing and Padding [16], Image quilting + total variance minimization [36] and Pixel Deflection (PD)[23]. We use publicly available implementations 666https://github.com/poloclub/jpeg-defense 777https://github.com/cihangxie/NIPS2017_adv_challenge_defense 888https://github.com/facebookresearch/adversarial_image_defenses 999https://github.com/iamaaditya/pixel-deflection of these methods. All experiments are run on the same set of images and against the same attacks for a fair comparison.

For our experiments, we explore two broad categories of Single Image Super Resolution (SISR) techniques: i)Interpolation based methods and ii)Deep Learning (DL) based methods. Interpolation based methods like Nearest Neighbor (NN), Bi-Linear and Bi-cubic upsampling are computationally efficient, but not quite robust against stronger attacks (DIFGSM and MDIFGSM). Recently proposed DL based methods have shown superior performance in terms of Peak Signal to Noise Ratio (PSNR) and Structural Similarity Index (SSIM), and the mean squared error (MSE). Here, we consider three DL based SISR techniques, i) Super Resolution using ResNet model (SR-ResNet) [52], ii) Enhanced Deep Residual Network for SISR (EDSR) [44] and iii) Super Resolution using Generative Adversarial Networks (SR-GAN) [52]. Our experiments show that EDSR consistently performs better. EDSR builds on a residual learning [1]

scheme that specifically focuses on high-frequency patterns in the images. Compared to the original ResNet, EDSR demonstrates substantial improvements by removing Batch Normalization layers (from each residual block) and ReLU activation (outside residual blocks).

4.1 Manifold Assumption Validation

In this paper we propose that clean and adversarial examples lie on different manifolds and super-resolving an image to a higher dimensional space remaps the adversarial sample back to natural image manifold.

To validate this assumption, we fine-tune a pre-trained Inception v-3 model on ImageNet dataset as a binary classifier using 10,000 pairs of clean and adversarial examples (generated from all the aforementioned attack techniques). We re-train the top 2 blocks while freezing the rest with a learning rate reduced by a factor of 10. The global average pooling layer of the model is followed by a batch normalization layer, drop-out layer and two dense layers (1024 and 1 nodes respectively). Our model efficiently leverages the subtle difference between clean images and their adversarial counterparts and separate the two with a very high accuracy (99.6%). To further validate our assumption on super-resolution, we test our defended images using this binary classifier. The classifier labels around 91% of the super-resolved images as clean, confirming that a vast majority of restored samples lie on the natural image manifold.

In Figure. 1

, we plot the features extracted from the last layer of the binary classifier to visualize our manifold assumption validation. We reduce the dimensionality of features to 3 for visualization (containing 90% of variance) using Principle Component Analysis.

max width=0.99 Attack No Defense Das et al. [35] Xie et al. [16] Guo et al. [36] Prakash et al. [23] Ours (SR) Ours (WD + SR) Clean 95.9 89.7 92.0 88.8 86.5 90.4 90.9 FGSM-2 22.1 58.3 65.2 68.3 70.7 87.1 87.5 FGSM-5 20.0 50.2 52.7 58.0 62.9 79.6 79.9 FGSM-10 23.1 43.5 47.5 50.5 54.2 69.8 70.1 I-FGSM 10.1 75.8 85.3 80.9 86.2 89.7 90.1 DeepFool 1.0 77.0 84.7 80.1 84.2 90.2 90.4 C&W 0.3 76.3 84.8 80.3 84.9 90.5 90.7 MI-FGSM 1.4 72.4 83.6 78.2 84.0 89.4 89.8 DIFGSM 1.7 2.0 5.1 3.1 54.6 48.9 63.8 MDIFGSM 0.6 1.3 4.0 1.8 20.4 26.1 28.7

TABLE II: Top-1 accuracy comparison for different defense mechanisms on NIPS-DEV dataset on Inception v-3 model.
Attack No Defense SR-ResNet [52] SR-GAN [52] EDSR [44]
Clean 100.0 94.0 92.3 96.2
FGSM-2 31.7 89.5 85.7 92.6
FGSM-5 28.7 83.7 80.1 85.7
FGSM-10 30.5 69.9 69.0 73.3
I-FGSM 11.4 93.4 91.0 95.9
DeepFool 0.4 93.2 93.0 95.5
C&W 0.8 93.3 91.3 95.6
MI-FGSM 1.7 92.6 87.6 95.2
DIFGSM 1.4 54.3 48.9 57.2
MDIFGSM 0.6 24.9 23.0 27.1
TABLE III: Performance comparison of various super-resolution techniques in the literature. The up-scaling factor . Top-1 accuracies are reported.

4.2 Results and Analysis

Table I shows the destruction rates of various defense mechanisms on 5000 ILSVRC validation set images. Destruction rate is defined as the ratio of successfully defended images [13]. A destruction rate of implies that all images are correctly classified after applying the defense mechanism. It should be noted that we define destruction rates in terms of top-1 classification accuracy, which makes defending against attacks more challenging since we have to recover the exact class label. ‘No Defense’ in Table I shows the model performance on generated adversarial images. A lower value under ‘No Defense’ is an indication of a strong attack. The results show that iterative attacks are better at fooling the model compared with the single-step attacks. The iterative attacks, however, are not transferable and are easier to defend. Similarly, targeted attacks are easier to defend compared with their non-targeted counterparts, as they tend to over-fit the attacked model [32]. Considering them as weak attacks, we therefore only report the performance of our defense scheme against more generic non-targeted attacks.

For the iterative attacks (C&W and DeepFool), both Random Resizing + Padding and PD achieve similar performance, successfully recovering about 90% of the images. In comparison, our proposed super-resolution based defense recovers about 96% of the images. For the single-step attack categories, Random Resizing + Padding fails to defend. This is also noted in [16]. To overcome this limitation, an ensemble model with adversarial augmentation is used for defense. Compared with the JPEG compression based defense [28], our proposed method achieves a substantial performance gain of 31.1% for FGSM (). In single-step attacks category (e.g., FGSM-10), our defense model outperforms Random Resizing + Padding and PD by a large margin of 26.7% and 21.0% respectively. For recently proposed strong attack (MDIFGSM), all defense techniques (JPEG compression, Random Resizing + Padding, Quilting + TVM and PD) largely fail, recovering only 1.3%, 5.8%, 1.7% and 21.9% of the images, respectively. In comparison, the proposed image super-resolution based defense can successfully recover of the images.

Transform Nearest Neighbor Bi-linear Bi-cubic
US - - - - - -
US DS - - - - - -
DS US - - - - - -
Attack
Clean 94.9 93.5 84.1 94.3 91.1 86.2 93.9 89.1 86.0
FGSM-2 74.2 25.9 23.8 73.5 24.0 21.4 71.2 20.3 19.5
FGSM-5 61.0 18.6 18.1 60.5 18.1 17.0 54.8 18.7 18.0
FGSM-10 52.9 16.8 16.0 50.1 15.7 15.4 49.2 16.2 15.8
I-FGSM 86.4 45.9 43.0 83.4 41.9 40.6 82.1 37.5 35.6
DeepFool 87.3 43.0 41.2 80.6 40.7 39.7 80.1 34.5 30.1
C&W 82.5 44.4 41.2 80.1 41.9 37.6 79.3 39.6 36.0
MI-FGSM 81.4 41.0 38.0 80.0 42.9 40.3 80.7 41.2 39.8
DIFGSM 36.0 5.8 3.9 34.8 6.1 4.9 31.2 7.8 5.6
MDIFGSM 16.1 3.8 2.0 10.2 3.9 3.5 9.6 2.0 1.7
TABLE IV: Performance of Nearest Neighbor, Bi-linear and Bi-cubic image resizing techniques as a defense. Evaluation is done on NIPS-DEV dataset using a pretrained Inception v-3 model. US: Upsample; DS: Downsample. The US and DS factor is 2.

We show a further performance comparison of our proposed defense with other methods on the NIPS-DEV dataset in Table II. Here, we only report results on Inception v-3, following the standard evaluation protocols as per the competition’s guidelines [49]. Inception v-3 is a stronger classifier, and we expect the results to generalize across other classifiers. Our experimental results in Table II show a superior performance of the proposed method.

4.3 Ablation Study

Super-resolution Methods: Image super resolution recovers off-the-manifold adversarial images from low-resolution space and remaps them to the high-resolution space. This should hold true for different super-resolution techniques in the literature. In Table III, we evaluate the effectiveness of three image super-resolution techniques- SR-ResNet, SR-GAN [52] and EDSR [44]. Specifically, attacked images are super-resolved to , without using any wavelet denoising. Experiments are performed on Inception v-3 classifier. The results in Table III show a comparable performance across the evaluated super-resolution methods. These results demonstrate the effectiveness of super-resolution to recover images.

Inception v-3 model ResNet-50 model Inception ResNet v-2 model
Attack No Defense WD SR WD+SR No Defense WD SR WD+SR No Defense WD SR WD+SR
Clean 100 94.0 96.2 97.0 100 92.7 93.2 93.9 100 94.0 97.2 98.2
FGSM-2 31.7 57.3 92.6 94.2 12.2 41.4 85.4 86.1 59.4 70.5 91.8 95.3
FGSM-5 28.7 36.4 85.7 87.9 7.0 12.7 74.0 77.2 55.0 57.5 85.7 87.4
FGSM-10 30.5 32.7 73.3 79.7 6.1 8.6 60.5 64.9 53.6 55.4 79.4 82.3
I-FGSM 11.4 76.4 95.9 96.2 13.4 71.2 91.0 92.3 21.6 82.6 94.3 95.8
DeepFool 0.4 74.9 95.5 96.1 1.0 71.8 89.3 91.5 0.1 79.1 95.4 96.0
C&W 0.8 76.3 95.6 96.0 0.1 79.0 92.0 93.1 0.3 81.0 94.0 95.6
MI-FGSM 1.7 77.0 95.2 95.9 0.4 71.2 89.6 92.0 0.5 80.6 93.0 95.0
DIFGSM 1.4 18.3 57.2 67.9 0.3 17.9 49.8 60.7 1.5 11.9 57.6 69.8
MDIFGSM 0.6 5.8 27.1 31.7 0.2 9.4 22.4 31.9 0.6 6.9 29.4 35.6
TABLE V: Individual contributions of Wavelet Denoising (WD) and Super Resolution (SR) towards the proposed defense scheme across three different classifiers. Parameters: and . The proposed defense scheme works well across a range of classifiers.

Besides state-of-the-art image super-resolution methods, we further consider documenting the results on enhancing image resolution using interpolation-based techniques. For this, we perform experiments by resizing the images with Nearest Neighbor, Bi-linear and Bi-cubic interpolation techniques. In Table IV, we report the results achieved by three different strategies: upsample (by ), upsample + downsample and downsample + upsample. The results show that, although the performance of the simple interpolation based methods is inferior to more sophisticated state-of-the-art super-resolution techniques in Table III, the simple interpolation based image resizing is surprisingly effective and achieves some degree of defense against adversarial attacks.

Effect of Wavelet Denoising: Our proposed defense first deploys wavelet denoising, which aims to minimize the effect of adversarial perturbations, followed by image super-resolution to selectively introduce high frequency components into an image (as seen in Fig. 3) and recover off-the-manifold attacked images. Here we investigate the individual impact of these two modules towards defending adversarial attacks. We perform extensive experiments on three classifiers: Inception v-3, ResNet-50 and InceptionResNet v-2. Table V shows the top-1 accuracy of each of the models for different adversarial attacks. The results show that, while wavelet denoising helps suppress added adversarial noise, the major performance boost is achieved with image super-resolution. The best performance is achieved when wavelet denoising is followed by super-resolution. These empirical evaluations demonstrate that image super-resolution with wavelet denoising is a robust model-agnostic defense technique for both iterative and non-iterative attacks.

Hyper-parameters Selection: Unlike many existing defense schemes, which require computationally expensive model re-training and parameter optimization [16, 17, 18, 24], our proposed defense is training-free and does not require tuning a large set of hyper-parameters. Our proposed defense has two hyper-parameters: the scale of super-resolution () and the coefficient of BayesShrink (). We perform a linear search over the scaling factor for one single-step (FGSM-2) and one iterative (C&W) attack on images, randomly selected from the ILSVRC validation-set. These experiments are performed on Inception v-3 model. Table VI shows the classifier performance across different super-resolution scaling factors. We select , since it clearly shows significantly superior performance. Higher values of introduce significant high frequency component in the image which degrade the performance. For , we follow [23] and choose as .


Attack
No Defense
Clean 100 97.2 79.0 59.2
FGSM 31.7 92.9 76.2 58.8
C&W 0.3 95.8 77.7 58.9

TABLE VI: Selection of super-resolution scaling factor. is selected due to its superior performance.

CAMs Visualization: Class Activation Maps (CAMs)[53] are weakly supervised localization techniques, which are helpful in interpreting the predictions of the CNN model by providing a visualization of discriminative regions in an image. CAMs are generated by replacing the last fully connected layer by a global average pooling (GAP) layer. A class weighted average of the outputs of the GAP results in a heat map which can localize the discriminative regions in the image responsible for the predicted class labels. Fig. 5 and  6 show the CAMs for the top-1 prediction of Inception v-3 model for clean, attacked and recovered image samples. It can be observed that mapping an adversarial image to higher resolution destroys most of the noisy patterns, recovering CAMs similar to the clean images. Row 5 (Fig. 5 and  6) show the added perturbations to the clean image sample. Super-resolving an image selectively adds high frequency components that eventually help in recovering model attention towards discriminative regions corresponding to the correct class labels (see Row 6, Fig. 5 and  6).

5 Conclusion

Adversarial perturbations can seriously compromise the security of deep learning based models. This can have wide repercussions since the recent success of deep learning has led to these models being deployed in a broad range of important applications, from health-care to surveillance. Thus designing robust defense mechanisms that can counter adversarial attacks without degrading performance on unperturbed images is an absolute requisite. In this paper, we presented an image restoration scheme based on super-resolution, that maps off-the-manifold adversarial samples back to the natural image manifold. We showed that the primary reason that super-resolution networks can negate the effect of adversarial noise is due to their addition of high-frequency information into the input image. Our proposed defense pipeline is agnostic to the underlying model and attack type, does not require any learning and operates equally well for black and white-box attacks. We demonstrated the effectiveness of proposed defense approach compared to state-of-the-art defense schemes, where it outperformed competing models by a considerable margin.

Fig. 5: Visualization of Defense against Single-Step Attack (FGSM). First row shows four clean images. Subsequent three rows show the class activation maps for clean, FGSM () attacked and recovered images. Second last row shows the perturbations (magnified ) added to the clean image by FGSM and the last row shows the diff between clean image and defended image (magnified )
Fig. 6: Visualization of Defense against Iterative Attack (CW). First row shows four clean images. Subsequent three rows show the class activation maps for clean, CW ( norm) attacked and recovered images. Second last row shows the perturbations (magnified ) added to the clean image by CW attack and the last row shows the diff between clean image and defended image (magnified ).

References