GraCIAS: Grassmannian of Corrupted Images for Adversarial Security

05/06/2020 ∙ by Ankita Shukla, et al. ∙ Arizona State University Indian Institute of Technology Delhi IIIT Delhi 4

Input transformation based defense strategies fall short in defending against strong adversarial attacks. Some successful defenses adopt approaches that either increase the randomness within the applied transformations, or make the defense computationally intensive, making it substantially more challenging for the attacker. However, it limits the applicability of such defenses as a pre-processing step, similar to computationally heavy approaches that use retraining and network modifications to achieve robustness to perturbations. In this work, we propose a defense strategy that applies random image corruptions to the input image alone, constructs a self-correlation based subspace followed by a projection operation to suppress the adversarial perturbation. Due to its simplicity, the proposed defense is computationally efficient as compared to the state-of-the-art, and yet can withstand huge perturbations. Further, we develop proximity relationships between the projection operator of a clean image and of its adversarially perturbed version, via bounds relating geodesic distance on the Grassmannian to matrix Frobenius norms. We empirically show that our strategy is complementary to other weak defenses like JPEG compression and can be seamlessly integrated with them to create a stronger defense. We present extensive experiments on the ImageNet dataset across four different models namely InceptionV3, ResNet50, VGG16 and MobileNet models with perturbation magnitude set to ϵ = 16. Unlike state-of-the-art approaches, even without any retraining, the proposed strategy achieves an absolute improvement of   4.5

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Adversarial attacks are small imperceptible perturbations carefully crafted to modify an image that can mislead the classification ability of state-of-the-art classifiers. Robustness of deep neural networks to such attacks

[5, 16, 3, 26]

rapidly became an active area of research due to the increasing adoption rates of deep learning based systems in practical applications often with high reliability and security requirements. Since these applications range from autonomous driving

[29] to medical evaluations [2], the robustness of these models to adversarial perturbations is a crucial aspect for their reliability.

Consequently, recent years have witnessed the development of both, adversaries [16, 5] that challenge the robustness of deep models, as well as the design of defense strategies to mitigate their effect. White box attacks have emerged as the most challenging form of adversarial attacks, where the adversary has access to model parameters, training data as well as the defense strategy. In order to provide security against such attacks, various approaches like [11, 15, 5, 28, 24, 27, 17, 19]

have focused on improving the model’s robustness by modifying either the network, the loss function and/or the training strategy. While these approaches have been successful to a great extent, many of them have either model or attack dependencies. Moreover, defenses that rely on adversarial training, i.e, the use of adversarially perturbed samples at train time, typically incur significant computational costs. For example, the recently proposed feature denoising approach

[28] required synchronized training on 128 NVIDIA V100 GPUs for 52 hours to train a baseline ResNet-152 model on ImageNet. While there have been efforts to reduce this training time, e.g., [19], which achieved a 5 speed-up in training, it still required 128 TPUs. In most practical scenarios, such hardware infrastructure is not readily accessible, and even when it is, these approaches assume the access to the model and the training data. On the other hand, there exist alternate defense strategies that do not require knowledge of the model or the training data and are more widely applicable.

To make existing systems more robust to adversarial attacks, several approaches employ simple pre-processing or post processing strategies [6, 18, 22, 13] that can be augmented with the deployed models directly at inference time. For example, input transformations like JPEG compression, bit depth reduction and image quilting pre-process an image before feeding it to the network to reduce the effect of adversarial noise. The success of these transforms is due to gradient obfuscation that results in incorrect or undefined gradients, limiting the impact of the typical gradient based white-box attacker. However, work by Athalye et al [1] overcame this shortcoming by computing gradient approximations in such scenarios, drastically dropping the performance of many defense strategies, and often completely defeating them (0 % classification accuracy over adversarial samples).

Recently, the work in [20]

proposed a heuristic approach, Barrage of Random Transformation (BaRT) achieving state-of-the art defense performance on the ResNet50 model on ImageNet dataset. The defense is performed by applying a subset of image transforms in random ordering that are selected randomly from a large pool of stochastic image transformations (e.g., FFT, swirl, contrast, to name a few). The key insight from BaRT’s approach is that this

compounded randomness drastically inhibits the adversary’s capability to correctly mimic the defense behavior. However, to cope up with the drastic changes in the input image due to large variations in the transforms required in BaRT, the network is fine-tuned to ensure model’s familiarity with the defense at the inference time. On the other hand, Dubey et al. [4]

defend adversarial attacks by averaging model prediction probabilities of

-nearest neighbors for a given sample. Interestingly, they motivate the nearest neighbor search in an unrelated, web-scale dataset with billions of images, where the aforementioned approach of averaging the prediction probabilities is aimed at projecting the adversarial test example back on to the image manifold. These approaches either need to fine-tune the model [20], or need access to an external database [4], and may not strictly qualify as an input transformation based defense.

Figure 1: Left: Representation of subspaces as points on the Grassmannian manifold. The subspace corresponding to the perturbed sample lies close to the subspace of its clean sample counterpart. The distance between these two subspaces in shown to be upper bounded as given by Eq. (8). Centre and Right: The histograms show that subspaces of a pair of images of same class are closer than subspaces of image pair formed from different classes. Given an adversarial sample, the plot highlights that the geodesic distance between clean sample subspace and its corresponding adversarially perturbed sample , is such that . Here and represent two different classes. The plot is shown for 8000 similar () and 8000 dissimilar pairs (). The normalized histogram for these pairs is shown for two models on ImageNet dataset : InceptionV3 (Center) and ResNet50 (Right)

To achieve our goal of devising an input transformation based defense, from the discussion above, we make two crucial observations that contribute in undermining the white-box adversary’s ability to generate an attack: First is the compounded randomness in the transformations applied to the input image. Second is the access to a set of similar images (identified as neighbors in the web-scaled database in case of Dubey et al [4]) to have an averaging or smoothening effect over predictions, leading to more accurate predictions for an adversarial sample. Contrary to [20, 4], we take an approach that simply relies only on a given sample and leverages benefits from both, the compounded random transformations as well as smoothing. However, the random transforms and smoothing are performed in a principled manner that reduces the impact of adversarial noise, without significant changes in the image. This makes our approach a model-agnostic, inference-time defense, that does not rely on additional data or training to achieve the desired goal of adversarial security.

Our proposed approach, Grassmannian of Corrupted Images for Adversarial Security (GraCIAS) applies a random number of randomized filtering operations to the input test image. These filtered images provide a basis for a lower dimensional subspace, which is then used for smoothing the input image. Due to a principled structure of generated image corruptions used for defining the subspace, it permits projection and reconstruction of the input image without substantial loss of information. Furthermore, we can interpret these subspaces as points on the Grassmann manifold, which permits us to derive an upper bound on the geodesic distance between the subspaces obtained by filtering a clean sample and its adversarially perturbed counterpart. This is also supported with empirical analysis, which suggests that the geodesic distances between the subspaces corresponding to clean and adversarial examples belonging to the same class are smaller than those corresponding to examples from different classes. Figure 1 illustrates the subspace representation and shows that the distribution of geodesic distances computed between subspace pairs of the same class and that of different classes are reasonably separable. This observation is central to our approach and validates that our choice of filters ensure a low-dimensional subspace that is representative of the test sample’s original class, and serves as an appropriate smoothing operator for the input samples. Through extensive experiments on the ImageNet dataset, we show the effectiveness of GraCIAS on several models under attack of various strengths. We summarize our contributions below:

  • The proposed input transformation based defense achieves state-of-the-art results on ImageNet dataset for ResNet50, InceptionV3, VGG16 and MobileNet models under different attacker strength in white box attack scenario.

  • As opposed to state of the art randomized input transformation approaches, GraCIAS benefits not only from its intrinsic random parametrization, but also from the theoretical motivation that suggests retention of task-relevant information and suppression of adversarial noise.

  • Due to its simplicity and computational efficiency, the proposed defense can be integrated with existing weak defenses like JPEG compression to create stronger defenses, as shown in our experiments.

2 Previous Work

The vulnerability of neural networks to adversarial perturbations has led the growth of a large number of defense strategies. Given the large volume of work in this area, we categorize the literature into two broad groups and briefly review recent developments in them.

Robust Training and Network Modification. Robust training refers to strategies that retrain a model with an augmented training set. The most popular strategies use adversarial training [15, 5, 11], where adversarially perturbed samples are included in the training set on the fly, i.e., during the model training. These approaches, while effective, are very computationally expensive, as the attacks have to be regenerated multiple times during the entire training process. On the other hand, approaches [28, 27] modify network architecture to achieve adversarial robustness. Xie et al [28] added feature denoising blocks in the model to circumvent the impact of noisy feature maps caused due to adversarial perturbations at the input. Whereas, [27]

transformed layerwise convolutional feature map into a new manifold using non-linear radial basis functions. However, both robust training and network modification, not only require access to model parameters and training data, but also necessary computational resources to perform the retraining, which may be nontrivial to obtain. This requirement poses a bottleneck in securing the already deployed systems for various applications that are built on deep learning models. Therefore, in case of restrictions on computational resources or access to model parameters, add-on defenses in the form of pre-processing blocks at the input or output of a network are viable.

Input Transformations. The limitations of the previous category are addressed by the input transformation based approaches, which aim to denoise the image before feeding it to the network for classification. Most transformation based defenses like the ones proposed in [6], e.g., image compression, bit depth reduction, image quilting etc., lead to obfuscated gradient, a way of gradient masking that gives a false sense of security. Such defenses have limited robustness on a given model under powerful attacks [1]. Other approaches that have been successful to some extent under strong attacks like the Barrage of Random Transforms (BaRT) [20] follow a highly randomized approach to choose at random from an enormous pool of transformations, making it difficult for the attacker to break. BaRT requires the model to be finetuned on the input transformations to reduce the drop in performance on clean samples. Another defense in this category [13] improves the standard JPEG compression to tackle the adversarial attack without significant drop in performance of clean samples. Other approaches exist that assume access to the training set. They either learn valid range spaces of clean samples, or use the training set to approximate the image manifold, on which the input image is projected [22, 24].

Our proposed approach GraCIAS is also an input transformation based approach. We emphasize that unlike most existing approaches, our defense is agnostic to model and the training set used.

3 Proposed Approach

Figure 2: An overview of GraCIAS defense applied on an adversarial sample. The number of

random filters are used for creating a set of corrupted images. These image are used to estimate a random

dimensional subspace that is used for obtaining the low dimensional representation followed by re-projection to image space to obtain a rectified image.

Given a trained deep network, an adversary can add an imperceptible perturbation to the input sample that forces the model to make a wrong prediction. For a given sample , an adversary generates a sample such that its label does not match that of the original sample, i.e., . Thus, the objective of an attacker can be understood as follows

Here, is the ground truth label of sample , is the added perturbation and is the perturbation bound.
Design Goal: The goal of an input transformation based defense strategy is to ‘clean’ an adversarial sample before feeding it into a classification network. The ‘cleaning’ should reduce mis-classifications due to the perturbation, while maintaining performance on clean samples. In this work, we propose an input transformation-based inference time approach that is simple and methodically randomized to achieve effective defense.

We strive to achieve this design goal and develop our defense strategy GraCIAS combines simple randomized corruptions, subspace projections and a geometric perspective on the input transformations. Figure 2 shows an overview of our approach, which we describe in detail in the following sections.

3.1 Proposed Defense Strategy

The process of generating the transform to rectify an adversarial sample is described as follows.

Image Set for Subspace Approximation. The aim of an input defense strategy is to find a transform that can estimate a clean sample from a given adversarial sample. As opposed to Barrage of random transforms [20], we focus on developing a random transform that is minimal without compromising the effectiveness of the transform. Our transform comprises of a projection step from image space to a low dimensional space and a reconstruction step to project back to image space. The first step is to generate a database required for estimating the subspace for projecting the adversarial sample. To this end, we use random image filtering to generate several noisy versions of a given adversarial sample. For such random filters, the set of blurred images can be written as

(1)

This step essentially corresponds to mixing a uniformly structured noise to the non uniform noise in the image caused due to adversarial perturbation. This mixing is achieved by multiple convolutions with kernels

that have uniformly distributed weights normalized to have unit

-norm i.e. and , where

is the vectorization operator and

are the indices of the kernel. In addition to the random kernels, the number of such kernels, to create the set of corresponding images is also picked at random from a fixed range of values . This choice of randomizing filter kernels and their number is driven by our goal of increasing randomness in the defense. Such random filters are more effective than other filters like gaussian blur, where their parametric nature is much easier for attacker to approximate.
Randomized Subspace for Projection. As is derived from random filtering of the input image itself, the span of its elements is likely to retain some information relevant for the end task. So we simply find an orthogonal basis for the subspace spanned by . The projection of the input image on to this subspace has a blurring effect on the adversarial perturbation mixed with random noise. In the absence of the adversarial perturbations, i.e., for clean samples, a similar blurring is expected along with retention of task-relevant information. Computing the basis for the subspace is computationally inexpensive even for high resolution datasets like ImageNet as the set of corrupted images is fairly small.

Re-projection into Image Space.

The final step is to reconstruct the image from the low dimensional mapping obtained in the previous step. As in PCA based reconstruction using the low dimensional mapping and the inverse transform, we obtain the restored image. The basis of the subspace will capture relevant image content in the first few leading principal components while the noisy components are captured by the later. Thus, restoring to a low dimensional subspace will filter out the noisy components. However, using a fixed dimension for subspace can be easily estimated by a white box adversary, weakening the defense effectiveness. Therefore, the dimension of the subspace is defined based on retaining a specific value of variance in the data. The variance value is selected randomly from a predefined range of values. This adds another level of randomness in choosing the subspace dimension, making the defense effective in the the presence of an adaptive attacker that is aware of the defense strategy.

3.2 Validity of Proposed Subspace

We now present our analysis to show that the subspace estimated with an adversarial example is close to subspace created with clean image counterpart. Our theoretical result is based on bounds obtained on the geodesic distance between the subspaces constructed by using the clean sample and the adversarially perturbed sample .

The column space of and are the subspaces containing the set of convolutions of and with random kernels i.e., and respectively. We represent the subspaces as and for perturbed and clean image respectively, which are both -dimensional subspaces in , where . Now, we want to compare these linear -dimensional subspaces in for which we make use of the Grassmann Manifold, , which is an analytic manifold, where each point represents a -dimensional subspace in regardless of the specific bases of the subspace. The distance between the two subspaces is then given by the geodesic distance between the points on the Grassmann manifold. The normalized shortest geodesic distance is defined as follows

(2)

Here, is the maximum possible distance on [10, 12]. It was shown in [10], that this normalized distance is upper bounded by the following expression

(3)
(4)

Here, is the spectral norm of pseudo inverse of , and denotes the Frobenius norm. Rewriting, 3 with squaring on both sides as

(5)

Here,

corresponds to the smallest eigenvalue of the

i.e.

inverse of the smallest singular value of

. Hence, we can write the following

(6)

In case of natural images the is non-zero. The eigenvalues of a blurred image decay faster than the clean images, as the blurred images are dominated by low frequency components [7]. Similarly, the other factor in the right side of (5) is given by

(7)

Here, ’s are BTTB matrices that are full rank under zero boundary condition for convolution and is the adversarial noise added to the clean image. Thus, substituting (6) and (7) in (5), we get

(8)

The bound in the above equation establishes that the subspaces of clean and adversarial sample are in close proximity, as long as the singular value term is bounded above. To show the latter, we adopt the following approach. It is not possible to provide a general bound on without assuming something about natural image statistics. On the other hand, it is easy to see that is small only for pathological examples. Consider the case of . This happens if and only if the columns of are linearly dependent. That is, there must exist non-zero scalars such that:

(9)

Using simple Fourier transform arguments, it can be shown that the above happens only under pathological cases such as when

is a constant-image, or the filters are all just plain delta functions. For any general situation, is finite, although a general lower bound is hard to find.

The random filtering and the subspace projection reduces the effect of adversarial noise. Further, in the above discussion, we have established that the subspaces derived from clean samples and those from their adversarially perturbed counterparts are close to each other. If such a subspace is representative of the clean sample, i.e., if it captures information relevant to the end task, then a nearby subspace like is also likely to retain similar information. Therefore, a projection and reconstruction operation on such a subspace ( or ) will achieve our objective of reducing adversarial noise and yet retaining relevant information. In addition to this proximity guarantee in (8), our empirical analysis of geodesics in Fig. 1 shows that the geodesic distances between subspaces obtained from an adversarial sample is closer to its clean counterpart than that of another clean image from a different class.

This observation along with our proximity result is the basis of our main hypothesis: The subspaces resulting from our randomly corrupted versions of the input image are sufficiently representative to retain task-relevant information and yet are effective transformations in attenuating adversarial noise. In the following sections, we validate this hypothesis through extensive empirical evaluation and analysis experiments.

4 Results

In this section, we firstly evaluate our defense strategy as pre-processing at inference time on adversarial samples generated with different perturbation magnitude as well as attacker’s knowledge. Secondly, we also present ablation experiments to thoroughly evaluate the choice of parameters used in the proposed defense strategy. Now, we list down the dataset, models and attack methods used to evaluate the performance of our defense strategy.

4.1 Experimental Setup

Datasets and Models. We present a series of experiments to evaluate the effectiveness of our defense strategy.

ImageNet-50K[21] dataset contains images of different sizes distributed along 1000 categories. The images are processed to dimensions encoded with 24 bit color. The validation set consists of 50,000 images. We represent this entire set as ImageNet-50K and a subset of first 10,000 images as ImageNet-10K for our experiments. For ImageNet [21], we evaluate the performance on InceptionV3 [25], Resent50[8], MobileNet [9] and VGG16 [23]

. We used pre-trained models available in Tensorflow.

Comparison with other Approaches. We compare our defense strategy with several input transformations that are summarized below:

JPEG compression, BitDepth reduction [6] and JPEGDNN [14]: These defenses are applied to an image only at the inference time. Similar to our approach, these approaches are model agnostic and hence can easily be integrated with existing systems. We used the framework of [1] and the authors’ implementation from github for evaluating these defenses under different attack scenarios. The JPEG defense is performed at a compression quality level of 75 (out of 100) and the images are reduced to 3 bits for BitDepth defense in all our experiments. For JPEGDNN, we used default parameters available with the authors’ implementation that is available on github.

Barrage of Random Transformation [20]

We compared state-of-the art performance of BaRT on ResNet50 model with our defense strategy on ImageNet50K. As the authors’ implementation is not publicly available, we use results as reported in their paper in our comparisons. While we outperform BaRT on accuracy of attacked images by a significant margin, our performance on clean images is lower. However, it is important to point out that BaRT’s model is fine-tuned for an addition 100 epochs with their transformed images, whereas we use the original model.

Parameters in GraCIAS. In our defense strategy there are two steps that require random parameter choice. First is the number of filters to create the set of corrupted images as given by 1. To limit the computational cost, it is chosen from a fixed range . This range is set to i.e. a minimum of 10 corrupted images to a maximum of 60. The other random parameter that adds to the robustness of the proposed defense strategy is the dimensionality of the subspace defined on the set of the set of images mentioned earlier. The dimension is computed based on the of data variance retained while computing the PCA basis. In order to avoid information loss and drastic image changes, the variance is selected between 60 % to 95% at random. The third parameter is the kernel size for the filtering operation. The filter size is fixed to for ImageNet dataset.

4.2 White Box Attacker

The white box adversary has access to model parameters, training data and trained weights to generate adversarial samples. We evaluate the performance of different models under FGSM [5] and PGD [15] attacks with distance. The different iterations of PGD attack are denoted by PGDk. For example, PGD with 10 iterations is denoted with PGD10.

We evaluated the performance of InceptionV3 model under FGSM and PGD attack. The results are present in Table 1 with perturbation value of . The table also presents results corresponding to standard JPEG compression and Bitdepth reduction that have been shown to be effective when defense is not known to the attacker. These transformations are applied at the inference time to transform the sample before feeding it to a pre-trained InceptionV3 network. The recent work on DNN guided JPEG compression [14] was shown to achieve state-of-the-art results when compared with other input transformations like image quilting, standard JPEG and bitdepth. While JPEGDNN performs well in low perturbation setting, the performance drops for high perturbation value of as reported in the table. Our approach outperforms previously best known result using JPEGDNN as input transformation.

Attack ImageNet10K (InceptionV3)
No def JPEG BitDepth JPEGDNN GraCIAS
(ICLR’18) (ICLR’18) (CVPR’19) (Our)
FGSM 22.88 25.32 24.76 26.21 37.63
PGD40 0.0 0.65 0.27 3.52 42.49
PGD100 0 0.0 0.23 3.01 44.32
Table 1: ImageNet 10K validation set: Comparison of different input transformation based defense on InceptionV3 model. The table reports defense classification accuracy under FGSM, PGD40 and PGD100 attacks with an attack magnitude of
Figure 3: Performance Comparison of various defenses across different magnitude of using (Left) FGSM and (Right) PGD10 attacks on InceptionV3 model.
Figure 4: Performance comparison of different defense strategies under different magnitude of FGSM attack on ImageNet dataset for VGG16 and ResNet50 models.

Performance across different Perturbation Magnitude. We evaluated the performance of our defense strategy over a range of perturbation magnitude for both PGD and FGSM attacks and compared with state-of-the-art approaches. The results are shown in Figure 3 for InceptionV3 and for VGG16 and ResNet50 in Figure 4. The results indicate that GraCIAS  sustains for larger range of perturbation before dropping at large magnitudes of perturbations.

4.3 Adaptive Adversary

The recent work [1]

showed that existing input transformation based defense strategies can be attacked with a strong attacker that has access to the defense strategy as well. A non-differentiable defense strategy can be defeated with BPDA (Backward Pass Differentiable Approximation) that approximates the gradient with identity while backpropagating through the transformation layer to develop strong attacks. We show that the proposed defense strategy can withstand such attacks as opposed to existing defense strategies. The results in Table

2 indicate that the proposed approach achieves state-of-the-art results on both InceptionV3 and ResNet50 models with approximately improved over state-of-the-art results. We also validated the efficacy of our approach across different perturbation magnitude as well as attack iterations in Fig 5 and show that GraCIAS  achieves non-trival accuracy as opposed to current state of the art input defense JPEG DNN.

Models
Defense Apply InceptionV3 ResNet50 MobileNet VGG16
JPEG [6] Only 9.82 0.0 0.0 0.0
(ICLR ’18) Inference
JPEG DNN [14] Only 13.12 0.35 0 18.38
(CVPR ’19) Inference
*BaRT k =5 [20][ Finetune NA 16.0 NA NA
(CVPR ’19) Inference
*BaRT k =10 [20] Finetune NA 36.0 NA NA
(CVPR’19) Inference
GraCIAS Only 19.65 41.94 35.6 21.5
Our Inference
Table 2: ImageNet Validation Set: Performance Comparison of defense classification accuracy under BPDA attack (, iteration 40) on InceptionV3, ResNet50, MobileNet and VGG16 models. *

indicates that the results are quoted from the respective paper, in the absence of open source implementation.

Figure 5: (Left) InceptionV3 model under BPDA attack with different perturbation magnitude on ImageNet dataset. The plot highlights that GraCIAS achieves state of the art results over previously reported with JPEGDNN. (Right) Performance of defense accuracy on ResNet50 model under different iterations of BPDA attack with . While both ResNet50 and VGG16 are completely defeated at increased attacker’s strength, GraCIAS still achieves non-trival defense performance.

To further ensure the effectiveness of proposed defense strategy, we also evaluate it against EOT (Expectation over Transformation), where attacker aims to capture the randomness in the transform by performing the transformation multiple times and use the average gradient. However, owing to presence of randomness at different steps of the transformation, the expectation over transformations fails to capture the randomness in the transform, even with as large as 100 runs. The InceptionV3 model achieved an accuracy of over 100 steps of EoT. Further, we also investigated the combination of the two i.e. BPDA +EoT, where the defense achieved an accuracy of similar to BPDA alone.

Effect on Clean Sample Accuracy. In the absence of detection of adversarial examples, the input transformation strategies are applied to both clean and adversarial samples, adversely effecting the performance on clean samples. To cope with this drop, network fine-tuning is done with the proposed defense, prior to applying defense at the inference time. We report that with GraCIAS, performance on clean samples drop by  16 and  23 for Inception and ResNet50 models on ImageNet without fine-tuning. However, this drop can be reduced with network fintuning as suggested by BaRT that achieves on clean samples against original clean sample accuracy of , i.e. drop. Due to limited hardware resources, we verified this on ResNet model for CIFAR10, where the fine-tuned model regained the drop by , suggesting similar benefits for ImageNet dataset on finetuning with our defense strategy.

5 Ablation Study

We now investigate the choice of parameters in our defense strategy. While the range for percentage of variance in data to be retained as well as range of number of filters is fixed across all the experiments, the parameter like filter size can depend on the dimension of the image. Also, performance of image operations to create the corrupted image set is evaluated under adaptive attack setting to validate the effectiveness of random filters over others.

Effect of Filter Size. The goal of our filtering operation is to develop a diverse yet informative set of images to estimate a subspace that retains task-relevant information. This effect can be verified from the results in Table 3 ImageNet datasets for two different perturbation magnitudes across three different filter sizes.

Operation Defense Accuracy
Gaussian Filter 17.11
Affine Transformation 11.86
Symmetric Transformation 7.29
Filter Size
3 17.11 16.41
5 17.43 18.90
7 22.38 19.65
Table 3: (Left) Effect of selecting different transforms to create the set of corrupted image given in Eq. 1 needed for our GraCIAS defense. (Right) Effect of filter size on defense performance on ImageNet dataset at different perturbation levels under adaptive adversary (BPDA+PGD) with 100 iterations

The results are reported in the adaptive attack setting (PGD +BPDA) to show the effect in stronger attack scenario. The results also point to the fact that in most real world applications, higher image resolutions are encountered, the choice of filter choice is not difficult.

Defense Accuracy
InceptionV3 ResNet50 VGG16
JPEG 5.58 39.2 18.9
GraCIAS JPEG 44.26 45.14 31.79
BitDepth 0.42 39.38 27.95
GraCIAS BitDepth 25.97 43.79 29.89
Table 4: ImageNet-10K: Performance of simple defenses with GraCIAS used as pre-processing at the inference time on various models with under PGD10 attack. The boost in the performance is indicative of GraCIAS ability to restore the image details making it easier for much simpler defense like JPEG and BitDepth to defend the attack

Random Filters vs. Other Transforms. We evaluated the effect of different image transforms to create the set of images in Eq. 1 required for defining the subspace. The results for the same are given in the Table 3. The performance suffers a significant drop with affine and symmetric transformation, due to the likely reason that the typically high frequency adversarial noise are still retained after such transformations.

GraCIAS as Pre-processing Prior to Other Defenses. JPEG compression and Bit depth reduction are simple input transformations that are very easy to incorporate in real world systems as they are model agnostic and can be implemented in hardware as well, however they become ineffective in presence of large perturbation as well as adaptive adversary. Since the proposed defense is also inexpensive in terms of computation resources for its implementation, it can be used to complement these defenses to retain their defense capabilities. The results in Table 4 show improvements on ImageNet dataset under PGD attack.

6 Conclusion

In this work, we proposed a simple randomized linear subspace-based input defense approach that is applied at inference time to mitigate the effect of adversarial noise. The proposed approach achieved state-of-the-art results on ImageNet dataset across four different deep classification networks. The proposed defense is extensively evaluated on attacks with different strength and magnitude and is shown to be effective in all cases.

References

  • [1] A. Athalye, N. Carlini, and D. Wagner (2018) Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. In

    International Conference on Machine Learning

    ,
    pp. 274–283. Cited by: §1, §2, §4.1, §4.3.
  • [2] Y. Bar, I. Diamant, L. Wolf, and H. Greenspan (2015) Deep learning with non-medical training used for chest pathology identification. In Medical Imaging 2015: Computer-Aided Diagnosis, Vol. 9414, pp. 94140V. Cited by: §1.
  • [3] N. Carlini and D. Wagner (2017) Adversarial examples are not easily detected: bypassing ten detection methods. In

    Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security

    ,
    pp. 3–14. Cited by: §1.
  • [4] A. Dubey, L. v. d. Maaten, Z. Yalniz, Y. Li, and D. Mahajan (2019) Defense against adversarial images using web-scale nearest-neighbor search. In

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    ,
    pp. 8767–8776. Cited by: §1, §1.
  • [5] I. J. Goodfellow, J. Shlens, and C. Szegedy (2015) Explaining and harnessing adversarial examples. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, External Links: Link Cited by: §1, §1, §2, §4.2.
  • [6] C. Guo, M. Rana, M. Cisse, and L. van der Maaten (2018) Countering adversarial images using input transformations. In International Conference on Learning Representations, External Links: Link Cited by: §1, §2, §4.1, Table 2.
  • [7] P. C. Hansen, J. G. Nagy, and D. P. O’leary (2006) Deblurring images: matrices, spectra, and filtering. Vol. 3, Siam. Cited by: §3.2.
  • [8] K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778. Cited by: §4.1.
  • [9] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam (2017)

    Mobilenets: efficient convolutional neural networks for mobile vision applications

    .
    arXiv preprint arXiv:1704.04861. Cited by: §4.1.
  • [10] S. Ji-guang (1987) Perturbation of angles between linear subspaces. Journal of Computational Mathematics, pp. 58–61. Cited by: §3.2.
  • [11] A. Kurakin, I. J. Goodfellow, and S. Bengio (2017) Adversarial machine learning at scale. Cited by: §1, §2.
  • [12] C. Li, Z. Shi, Y. Liu, and B. Xu (2014) Grassmann manifold based shape matching and retrieval under partial occlusions. In International Symposium on Optoelectronic Technology and Application 2014: Image Processing and Pattern Recognition, Vol. 9301, pp. 93012O. Cited by: §3.2.
  • [13] Z. Liu, Q. Liu, T. Liu, N. Xu, X. Lin, Y. Wang, and W. Wen (2019) Feature distillation: dnn-oriented jpeg compression against adversarial examples. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 860–868. Cited by: §1, §2.
  • [14] Z. Liu, Q. Liu, T. Liu, N. Xu, X. Lin, Y. Wang, and W. Wen (2019) Feature distillation: dnn-oriented jpeg compression against adversarial examples. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 860–868. Cited by: §4.1, §4.2, Table 2.
  • [15] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu (2018) Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, External Links: Link Cited by: §1, §2, §4.2.
  • [16] S. Moosavi-Dezfooli, A. Fawzi, and P. Frossard (2016) Deepfool: a simple and accurate method to fool deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2574–2582. Cited by: §1, §1.
  • [17] A. Mustafa, S. Khan, M. Hayat, R. Goecke, J. Shen, and L. Shao (2019) Adversarial defense by restricting the hidden space of deep neural networks. In Proceedings of the IEEE International Conference on Computer Vision, pp. 3385–3394. Cited by: §1.
  • [18] A. Prakash, N. Moran, S. Garber, A. DiLillo, and J. Storer (2018) Deflecting adversarial attacks with pixel deflection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8571–8580. Cited by: §1.
  • [19] C. Qin, J. Martens, S. Gowal, D. Krishnan, K. Dvijotham, A. Fawzi, S. De, R. Stanforth, and P. Kohli (2019) Adversarial robustness through local linearization. In Advances in Neural Information Processing Systems, pp. 13824–13833. Cited by: §1.
  • [20] E. Raff, J. Sylvester, S. Forsyth, and M. McLean (2019) Barrage of random transforms for adversarially robust defense. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6528–6537. Cited by: §1, §1, §2, §3.1, §4.1, Table 2.
  • [21] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. S. Bernstein, A. C. Berg, and F. Li (2015) ImageNet large scale visual recognition challenge. International Journal of Computer Vision 115 (3), pp. 211–252. External Links: Link, Document Cited by: §4.1.
  • [22] P. Samangouei, M. Kabkab, and R. Chellappa (2018) Defense-GAN: protecting classifiers against adversarial attacks using generative models. In International Conference on Learning Representations, External Links: Link Cited by: §1, §2.
  • [23] K. Simonyan and A. Zisserman (2015) Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations, Cited by: §4.1.
  • [24] Y. Song, T. Kim, S. Nowozin, S. Ermon, and N. Kushman (2018) PixelDefend: leveraging generative models to understand and defend against adversarial examples. In International Conference on Learning Representations, External Links: Link Cited by: §1, §2.
  • [25] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna (2016) Rethinking the inception architecture for computer vision. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. External Links: Link, Document Cited by: §4.1.
  • [26] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus (2013) Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199. Cited by: §1.
  • [27] S. A. Taghanaki, K. Abhishek, S. Azizi, and G. Hamarneh (2019) A kernelized manifold mapping to diminish the effect of adversarial perturbations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 11340–11349. Cited by: §1, §2.
  • [28] C. Xie, Y. Wu, L. v. d. Maaten, A. L. Yuille, and K. He (2019) Feature denoising for improving adversarial robustness. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 501–509. Cited by: §1, §2.
  • [29] H. Xu, Y. Gao, F. Yu, and T. Darrell (2017) End-to-end learning of driving models from large-scale video datasets. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2174–2182. Cited by: §1.