Defense Against Adversarial Attacks with Saak Transform

08/06/2018 ∙ by Sibo Song, et al. ∙ 0

Deep neural networks (DNNs) are known to be vulnerable to adversarial perturbations, which imposes a serious threat to DNN-based decision systems. In this paper, we propose to apply the lossy Saak transform to adversarially perturbed images as a preprocessing tool to defend against adversarial attacks. Saak transform is a recently-proposed state-of-the-art for computing the spatial-spectral representations of input images. Empirically, we observe that outputs of the Saak transform are very discriminative in differentiating adversarial examples from clean ones. Therefore, we propose a Saak transform based preprocessing method with three steps: 1) transforming an input image to a joint spatial-spectral representation via the forward Saak transform, 2) apply filtering to its high-frequency components, and, 3) reconstructing the image via the inverse Saak transform. The processed image is found to be robust against adversarial perturbations. We conduct extensive experiments to investigate various settings of the Saak transform and filtering functions. Without harming the decision performance on clean images, our method outperforms state-of-the-art adversarial defense methods by a substantial margin on both the CIFAR-10 and ImageNet datasets. Importantly, our results suggest that adversarial perturbations can be effectively and efficiently defended using state-of-the-art frequency analysis.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

page 7

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Recent advances in deep learning have made unprecedented success in many real-world computer vision problems such as face recognition, autonomous driving, person re-identification, etc.

[1, 2, 3]. However, it was first pointed by Szegedy et al. [4] that deep neural networks (DNN) can be easily fooled by adding carefully-crafted adversarial perturbations to input images. These adversarial examples can trick deep learning systems into erroneous predictions with high confidence. It was further shown in [5] that these examples exist in the physical world. Even worse, adversarial attacks are often transferable [6, 7]; namely, one can generate adversarial attacks without knowing the parameters of a target model.

These observations have triggered broad interests in adversarial defense research to improve the robustness of DNN-based decision systems. Currently, defenses against adversarial attacks can be categorized into two major types. One is to mask the gradients of the target neural networks by modifying them through adding layers or changing loss/activation functions, e.g.,

[8, 9, 10]. The other is to remove adversarial perturbations by applying transformations to input data [11, 12, 13, 14]. Since these transformations are non-differentiable, it is difficult for adversaries to attack through gradient-based methods.

In this work, we focus on adverserial attacks to the convolutional neural network (CNN) and propose a defense method that maps input images into a joint spatial-spectral representation with the forward Saak Transform

[15], purifies their spatial-spectral representations by filtering out high-frequency components and, then, reconstructs the images. As illustrated in Fig. 1, the proposed mechanism is applied to images as a preprocessing tool before they goes through the CNN. We explore three filtering strategies and apply them to transformed representations to effectively remove adversarial perturbations. The rationale is that, as adversarial perturbations are usually undetectable by the human vision system (HVS), reducing high-frequency components should contribute to adversarial noise removal without hurting the decision accuracy of clean data much since it preserves components that are important to the HVS in restored images. We propose to use the Saak Transform [15] to perform the frequency analysis. As the state-of-the-art for computing spatial-spectral representation, our empirical results demonstrate that Saak coefficients of high spectral dimensions are discriminative for adversarial and clean examples. Our algorithm is efficient since it demands neither adversarial training with any label information nor modification of neural networks.

Fig. 1: Demonstration of our proposed method to filter out the adversarial perturbations with the multi-stage Saak transform.

Ii Related Work

Adversarial attacks have been extensively studied since the very earliest attempt [4], which generated L-BFGS adversarial examples by solving a box-constrained optimization problem to fool the neural networks. Goodfellow et al. [16] developed an efficient fast gradient sign method (FGSM) to add adversarial perturbations by computing the gradients of the cost function w.r.t the input. Along this direction, Kurakin et al. [5] proposed a basic iterative method (BIM) that iteratively computes the gradients and takes a small step in the direction (instead of a large one like the FGSM). Later, Dong et al. [17] integrated a momentum term to the BIM to stabilize update directions. Papernot et al. [18] generated an adversarial attack by restricting the -norm of perturbations. DeepFool (DF) [19] iteratively calculates perturbations to take adversarial images to the decision boundary which is linearized in the high-dimensional space. It was further extended to fool a network with a single universal perturbation [20]. Carlini and Wagner [21] proposed three variants of adversarial attacks under , and distance constraints. Chen et al. [22] generated a strong attack by adding elastic-net regularization to combine and penalty functions. Unlike above-mentioned methods, Xiao et al. [23] proposed a novel method that crafts adversarial examples by applying spatial and locally-smooth transformation instead of focusing on pixel-level change. Su et al. [24] presented a way to attack neural networks by changing only one pixel from each image through a differential evolution algorithm.

Recently, a few defense techniques have been proposed in detecting and defending against adversarial attacks. Papernot et al. [8]

proposed a defensive distillation method that uses soft labels from a teacher network to train a student model. Gu and Rigazio

[25] applied stacked denoising auto-encoders to reduce adversarial perturbations. Li and Li [26]

used cascaded SVM classifiers to classify adversarial examples. They also showed that

average filters can mitigate adversarial effect. Recently, Tramer et al. [27] achieved good results by training networks with adversarial images generated from different models. Guo et al. [11]

applies input transformations such as cropping, bit-depth reduction, JPEG compression and total variance minimization to remove adversarial perturbations. Similarly, Xu

et al. [12] proposed several strategies, including median smoothing and bit-depth reduction, to destruct adversarial perturbations spatially. Dziugaite et al. [28] investigated the effect of JPEG compression on adversarial images. Based on that, Akhtar et al. [29] built a perturbation detector using the Discrete Cosine Transform (DCT). Very recently, Liu et al. [13] designed an adversarial-example-oriented table to replace the default quantization table in JPEG compression to remove adversarial noise.

The Saak transform [15] provides an efficient, scalable and robust tool for image representation. It is a representation not derived by differentiation. This makes gradient-based or optimization-based attack difficult to apply. The Saak transform has several advantages over the DCT in removing adversarial perturbations. First, Saak transform kernels are derived from the input data while the ones used in DCT are data independent. Saak kernels trained with clean data from a specific dataset are more effective in removing perturbation noise. Furthermore, the PCA is used in the Saak transform to remove statistical dependency among pixels, which is optimal in theory. The DCT is known to be a low-complexity approximation to the PCA in achieving the desired whitening effect. Finally, the multi-stage Saak transform can preserve the prominent spatial-spectral information of the input data, which contributes to robust classification of attacked images. In this work, we investigate Saak coefficients of each spectral component, and observe that high frequency channels contribute more to adversarial perturbations. We conduct experiments on the CIFAR-10 and the ImageNet datasets and show that our approach outperforms all state-of-the-art approaches.

Iii Proposed Method

Iii-a Problem Formulation

Before introducing the proposed method, we formulate the problem of adversarial attacks and defenses with respect to a given neural network. A neural network is denoted by a mapping , where is and input image of size and

is the predicted output vector. Given neural network model

, clean image and its ground-truth label , crafting an adversarial example denoted by can be described as a box-constraint optimization problem:

(1)

where is the norm. Our goal is to find a transformation function to mitigate the adversarial effect of . In other words, we aim at obtaining a transformation such that predictions of the transformed adversarial examples are as close to ground-truth labels as possible. Ideally, . In most settings of recent attacks, an adversary can have direct access to the model and attack the model by taking advantage of the gradients of the network w.r.t the input. For this reason, a desired transformation, , should be non-differentiable. This will make attacks on the target model more challenging even if an attacker can access all parameters in and .

Iii-B Image Transformation via Saak Coefficients Filtering

Fig. 2: From left to right: The distributions (histograms) of Saak coefficients of clean images and FGSM adversarial examples in four representative low and high spectral components are shown in (a) and (b), respectively. In (a), distributions of Saak coefficients belonging to low spectral dimension 0 to 3 are shown. In (b), distributions of Saak coefficients belonging to high spectral dimension 45 to 48 are shown. The y-axis represents the values of Saak coefficients. As shown in (b), for high spectral components, the Saak coefficients distributions are different between clean and adversarial examples. In (c) and (d), the normalized and the original RMSE (root-mean-squared-error) values between clean and FGSM adversarial examples in different spectral components are shown. The y-axis in both figures represents the spectral dimension of Saak coefficients. In (c), we can observe that clean and adversarial examples have different coefficient values in high spectral dimensions. (These results are obtained from two-stage Saak transform with local cuboids.)

Saak Transformation of an image. As shown in the left part of Fig. 1, an image is decomposed into blocks of pixels (or local cuboids (LCs) of shape , where is the spectral dimension). Then, the KLT transform is conducted on the block (or cuboid) by merging four spatial nodes into one parent node. This process is recursively conducted stage by stage in the forward multi-stage Saak transform. Note that, is a typical size of LCs for illustration purpose, we can use arbitrary size in practice. The signed KLT coefficients in each stage are called Saak coefficients. The Saak coefficients are discriminative features of the input image. One can define a multi-stage inverse Saak transform that converts cuboids of lower spatial resolutions and high spectral resolutions to cuboids of higher spatial resolutions and lower spectral resolutions and, eventually, reconstructs the approximate (or original) image depending on whether the lossy (or lossy) Saak transform is adopted.

Saak coefficients as discriminative features. We compare the histogram of Saak coefficient values of clean (in blue) and adversarial images (in green) in four representative low and high spectral components in Figs. 2 (a) and (b), respectively. Following the experiment setup in [13], the adversarial examples are generated using FGSM algorithms on the first 100 correctly classified samples in the CIFAR-10 test set. We remark that similar observations can be found in other adversarial attacks. We see that clean and adversarial images share similar distributions in lower spectral components. However, for high spectral components, adversarial examples have much larger variances than clean examples. These results indicate that high-frequency Saak coefficients have discriminative power in distinguishing adversarial and clean examples. In addition, we show the normalized and the original root-mean-square error of Saak coefficients between clean images and adversarial examples in Figs. 2 (c) and (d), respectively. After normalization by the range of clean Saak coefficients, we see clearly that the difference between clean and adversarially perturbed images primarily lies in high-frequency Saak coefficients. Based on the above observation, we propose to use the multi-stage Saak transform [15] as the transformation function . As mentioned earlier, the Saak transform can offer a joint spatial-spectral representation. It maps a local 3D cuboid into a 1D rectified spectral vector via a one-stage Saak transform. Multiple local 3D cuboids can be transformed in parallel, and the union of them form a global 3D cuboid. The global cuboid consists of two spatial dimensions and one spectral dimension. The one-stage Saak transform consists of two cascaded operations: 1) signal transform via PCA and 2) sign-to-position format conversion. It allows both lossless and lossy transforms. The distance between any two input vectors and that of their corresponding output vectors is preserved at a certain degree. Furthermore, the one-stage Saak transform can be extended to multi-stage Saak transforms by cascading multiple one-stage Saak transforms so as to provide a wide range of spatial-spectral representations and higher order of statistics.

Iii-C Mathematical Formalization

Mathematically, the Saak-based preprocessing technique can be written as

(2)

where is an adversarial example, and are the forward and inverse multi-stage Saak transforms, respectively, and denote a filtering function to be discussed later. Given Saak coefficients in an intermediate stage, one can reconstruct an image using the inverse Saak transform on filtered Saak coefficients denoted by . In other words, we convert an adversarial perturbation removal problem into a Saak coefficients filtering problem. We attempt to formalize the idea in Sec. III-B below.

First, we have

(3)

where and denote the orthogonal subsets of high- and low-frequency Saak coefficients. The approximation is based on the semi-distance preserving property of the inverse Saak transform. Under the assumption that adversarial noise lies in high-frequency regions, which is supported by the discussion in Sec. III-B, we have

(4)

Then, if we can design a filter operating on high-frequency components so that the difference of purified Saak coefficients between clean images and adversarial images is minimized, the difference between clean images and adversarial images can be minimized as well. In other words, the adversarial perturbations are removed. This will be investigated in the next subsection.

Iii-D Saak Coefficients Filtering

To reduce adversarial perturbations, we propose three high-frequency Saak coefficients filtering strategies to minimize in this subsection. They are dynamic-range scaling, truncation, and clipping. Each of them will be detailed below.

As shown in Fig. 2 (b), high-frequency Saak coefficients of clean images tend to have a smaller dynamic range while those of adversarial examples have a larger dynamic range. Thus, the first adversarial perturbation filtering strategy is to re-scale high-frequency Saak coefficients to match the statistics of those of clean images. We expect adversarial perturbations to be mitigated by enforcing the variances of high-frequency Saak coefficients of clean images and adversarial examples to be the same. This is an empirical way to to minimize .

The second strategy is to truncate high-frequency Saak coefficients. Specifically, we set the least important Saak coefficients to zeros. Since HVS is less sensitive to high-frequency components, image compression algorithms exploit this psycho-visual property by quantizing them with a larger quantization step size. In addition, high-frequency Saak coefficients are very small for clean images, i.e. yet those of adversarial examples may become larger. Based on this observation, we can truncate high-frequency Saak coefficients as a simple way to minimize .

Setting (2-2) (2-4) (2-5) (4-2)
Filtered coeff. 40 42 44 640 660 680 2700 2710 2720 580 600 620
clean 84% 84% 68% 96% 96% 91% 96% 95% 95% 96% 95% 91%
FGSM 53% 57% 55% 30% 38% 37% 35% 35% 34% 33% 38% 40%
BIM 65% 69% 62% 49% 61% 61% 63% 63% 65% 54% 58% 64%
DF 77% 77% 65% 83% 83% 82% 86% 86% 86% 84% 83% 81%
63% 61% 45% 56% 64% 66% 65% 62% 62% 58% 65% 64%
84% 81% 66% 94% 92% 89% 93% 91% 90% 89% 93% 85%
82% 80% 67% 90% 90% 85% 92% 92% 92% 87% 89% 85%
All attacks 70.67% 70.83% 60.00% 67.00% 71.33% 70.00% 72.33% 71.50% 71.50% 67.50% 71.00% 69.83%
TABLE I: Comparison of the classification accuracy on the selected CIFAR-10 test set with four Saak transform settings under various attacks, where the filtering strategy is scaling the high-frequency Saak coefficients with a factor . The obtained best result is shown in the bold font.

Truncating all high-frequency components might hurt the classification performance on clean data. The third strategy to combat adversarial perturbations is to clip the high-frequency Saak coefficients to a constant small value (instead of zero).

Iv Experiments

Iv-a Experimental Setup

We conducted experiments on the CIFAR-10 and the ImageNet datasets. The CIFAR-10 dataset [30] consists of colored images drawn from 10 categories with a size of . The train and test sets contain 50,000 and 10,000 images, respectively. The ImageNet dataset [31] has 1000 classes of various objects. It contains 1.2 million images in the training set, and 50,000 in the validation set. We apply the central crop to images from the ImageNet validation set as the input to craft adversarial examples.

To make a fair comparison with recent work [12, 13], we follow the same experiment setup of them and use the DenseNet [32] and the MobileNets [33] as the target models in generating adversarial examples and evaluating all defense methods on CIFAR-10 and ImageNet dataset respectively. We chose six popular and effective attack algorithms: FGSM [16], BIM [5], DF [19], , and [21], which generate adversarial examples at different distortion constraints. The least-likely class is chosen to generate targeted CW attacks. Following the setup of [12, 13], we construct a selected set by taking the first 100 correctly classified samples in the test set as seed images to craft adversarial examples, since and attack algorithms are too expensive to run on the whole test set.

We compare our method with recent adversarial defense methods on mitigating adversarial effect that applied input transformations or denoising methods as reported in [12, 13, 11, 34]. Our method is very efficient since it requires no adversarial training, no change on the target model and no back-propagation to train. It can be easily implemented in Python and we will release the code.

Iv-B Ablation Studies

As mentioned in [15], the multi-stage Saak transform provides a family of spatial-spectral representations. The intermediate stages in the Saak transform offer different spatial-spectral trade-offs. We first study the effect of different hyper-parameters of the multi-stage Saak transform. We seek to answer two questions: 1) the spatial dimension of 3D local cuboids (LCs) in the Saak transform, and (2) the number of stages used in the Saak transform. We focus on four settings of forward and inverse Saak transforms: 1) spatial dimension in 2 stages, 2) spatial dimension in 4 stages, 3) spatial dimension in 5 stages, and 4) spatial dimension in 2 stages. For simplicity, we denote the settings as ({spatial dimension}-{stage}). For example, (2-5) represents the setting of using local cuboid of spatial dimension in 5-stage Saak transform. We choose the high-frequency coefficients scaling strategy of a factor in all experiments and report the CIFAR-10 classification results in Table I. For each setting, we consider 3 scenarios in terms of filtered high-frequency Saak coefficeints.

Filtering Truncation Clipping by 0.02
Filtered coeff. 2700 2710 2720 2730 2700 2710 2720 2730
clean 92% 92% 92% 91% 95% 94% 92% 91%
FGSM 60% 63% 63% 62% 44% 47% 47% 44%
BIM 72% 69% 73% 71% 67% 66% 69% 67%
DF 86% 87% 88% 86% 86% 85% 84% 84%
71% 70% 68% 69% 69% 68% 68% 68%
91% 90% 90% 90% 91% 91% 91% 91%
89% 90% 90% 90% 91% 91% 92% 91%
All attacks 78.17% 78.17% 78.67% 78.00% 74.67% 74.67% 75.17% 74.17%
TABLE II: Comparison of classification results using the high-frequency Saak coefficient truncation and clipping strategies under various attacks for the selected CIFAR-10 test set (with Saak setting of (2-5)). The obtained best result is shown in the bold font.

Since input images of CIFAR-10 is of spatial dimension , the spatial dimensions of the output of these settings are , , , and , respectively. The corresponding spectral dimensions are 53, 853, 3413 and 785. As shown in Table I, we see better defense performance as we increase the filter size or the stage number. The performance are comparable when the output spatial dimensions are the same. We believe that this is related to the receptive field size of the last stage of Saak coefficients used for image reconstruction. The multi-stage Saak transform can incorporate longer-distance pixel correlations to mitigate adversarial perturbations more. For this reason, we choose the 5-stage Saak transform with local cuboids of spatial dimension (denoted by (2-5)) in the following experiments as our evaluation baseline.

Defense Methods FGSM BIM DF All attacks clean
JPEG [14, 34] (Q=90) 38% 29% 67% 2% 80% 71% 47.83% 94%
Feature Distillation [13] 41% 51% 79% 18% 86% 76% 58.50% 94%
Bit Depth Reduction (5-bit) [12] 17% 13% 40% 0% 47% 19% 22.66% 93%
Bit Depth Reduction (4-bit) [12] 21% 29% 72% 10% 84% 74% 48.33% 93%
Median Smoothing (2x2) [12] 38% 56% 83% 85% 83% 86% 71.83% 89%
Non-local Mean (11-3-4) [12] 27% 46% 76% 11% 88% 84% 55.33% 91%
Cropping [11] 46% 43% 51% 15% 79% 76% 51.66% 86%
TVM [11] 41% 40% 44% 34% 75% 71% 50.83% 92%
Quilting [11] 37% 42% 36% 25% 67% 70% 46.17% 90%
Ours (4-2) 58% 70% 84% 69% 88% 88% 76.17% 90%
Ours (2-5) 63% 73% 88% 68% 90% 90% 78.67% 92%
TABLE III: Comparison of classification results of different defense methods on the selected CIFAR-10 test set.
Defense Methods FGSM BIM DF All attacks clean
JPEG [14, 34] (Q=90) 1% 0% 8% 4% 68% 32% 18.83% 70%
Feature Distillation [13] 8% 17% 55% 57% 82% 72% 48.50% 66%
Bit Depth Reduction (5-bit) [12] 2% 0% 21% 18% 66% 60% 27.83% 69%
Bit Depth Reduction (4-bit) [12] 5% 4% 44% 67% 82% 79% 46.83% 68%
Median Smoothing (2x2) [12] 22% 28% 72% 85% 84% 81% 62.00% 65%
Median Smoothing (3x3) [12] 33% 41% 66% 79% 79% 76% 62.33% 62%
Non-local Mean (11-3-4) [12] 10% 25% 57% 47% 86% 82% 51.17% 65%
Ours (4-2) 47% 58% 65% 66% 71% 69% 62.67% 69%
Ours (4-2) + Mean () 31% 46% 64% 68% 81% 81% 61.83% 81%
Ours (4-2) + Median () 33% 45% 70% 84% 83% 83% 66.33% 86%
TABLE IV: Comparison of classification results of different defense methods on the selected ImageNet test set.

We show the classification accuracy using the other two coefficient filtering strategies; namely, truncation and clipping, under the same baseline setting (2-5) in Table II. By comparing Tables I and II, we see that the coefficient truncation strategy performs the best with a substantial gain over the coefficient scaling and clipping strategies. Thus, we choose truncation strategy as the evaluation method on CIFAR-10 and ImageNet datasets.

Iv-C Comparison with State-of-the-Art Methods

We compare our method with other state-of-the-art methods in combating adversarial perturbations for the CIFAR-10 dataset in Table III. We see that our Saak-transform-based method outperforms state-of-the-art methods on all attack types except for the attack by a significant margin. This may be attributed to the fact that the adversarial noise is more prominent as shown in Fig. 3. It is evident that some colored patches in Saak-transform-filtered image are caused by attack which shows that our filtering strategy fails to remove the adversarial noise well since it is already diffused into low-frequency channels. Yet, it is worthwhile to point out our proposed method achieves 90% accuracy on and adversarial examples of CIFAR-10 dataset, which is close to the accuracy on clean data. Moreover, our method is powerful in removing FGSM-based perturbations. Furthermore, we compare different defense methods on the ImageNet dataset in Table IV. Our method outperforms other defense methods when applied alone with the setting of (4-2). In addition, to verify the above conjecture on the attack, we cascade the proposed method with two spatial denoising methods, median and mean smoothing, to further improve the performance. As shown in Table IV, the median smoothing method does boost the performance on the attack significantly. Meanwhile, it improves the classification accuracy on clean data to 86%. These results demonstrate that our solution is complementary to spatial smoothing techniques and, more importantly, can greatly mitigate the adversarial effect without severely hurting the classification performance on clean images unlike other methods. This is highly desirable as a computer vision system is usually unaware whether an input image is maliciously polluted or not.

Fig. 3: Comparison of adversarial examples and Saak-transform-filtered images on six adversarial attacks with Saak transform setting (4-2) using truncation strategy. Difference image is converted into gray scale.

V Conclusion

We presented a method to filter out adversarial perturbations based on the Saak transform, a state-of-the-art spectral analysis algorithm. It can be used as a light-weight add-on to existing neural networks. The method was comprehensively evaluated in different settings and filtering strategies. It is effective and efficient in defending against adversarial attacks as a result of the following three special characteristics. It is non-differentiable. It does not modify the target model. It requires no adversarial training and no label information. It was shown by experiments that the proposed method outperforms state-of-the-art defense methods by a large margin on both the CIFAR-10 and the ImageNet datasets while maintaining good performance on clean images.

References