RAIN: Robust and Accurate Classification Networks with Randomization and Enhancement

04/24/2020 ∙ by Jiawei Du, et al. ∙ 0

Along with the extensive applications of CNN models for classification, there has been a growing requirement for their robustness against adversarial examples. In recent years, many adversarial defense methods have been introduced, but most of them have to sacrifice classification accuracy on clean samples to achieve better robustness of CNNs. In this paper, we propose a novel framework to improve robustness and meanwhile retain the accuracy of given classification CNN models, termed as RAIN, which consists of two conjugate modules: structured randomization (SRd) and detail generation (DG). Specifically, the SRd module randomly downsamples and shifts the input, which can destroy the structure of adversarial perturbations so as to improve the model robustness. However, such operations also incur accuracy drop inevitably. Through our empirical study, the resultant image of the SRd module suffers loss of high-frequency details that are crucial for model accuracy. To remedy the accuracy drop, RAIN couples a deep super-resolution model as the DG module for recovering rich details in the resultant image. We evaluate RAIN on STL10 and the ImageNet datasets, and experiment results well demonstrate its great robustness against adversarial examples as well as comparable classification accuracy to non-robustified counterparts on clean samples. Our framework is simple, effective and substantially extends the application of adversarial defense techniques to realistic scenarios where clean and adversarial samples are mixed.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In the past decades, CNN based classification models have been successfully applied to a variety of important systems such as finance [4], security [30] and driving assistants [27]. In these real-world applications, system safety is often deemed to enjoy higher superiority over the performance. However, CNN models are revealed to be highly vulnerable to adversarial examples [31, 3] — even adding a few visually imperceptible perturbations could easily fool CNNs to make fatal predictions. With the ever growing applications of CNNs, their safety issue becomes more significant and needs more attention.

Figure 1: The pipeline of our proposed RAIN. The input image first goes through structured randomization (SRd) module to be randomly shifted and randomly downsampled. This module enhances the robustness but leads to accuracy drop. Then, the downsampled image is sent to detail generation (DG) module to recover details for remedying accuracy. Lastly, the resultant image is fed into the given CNN

To enhance adversarial robustness of CNNs, lots of adversarial defense approaches have been developed which can be roughly divided into three categories: input transformation, adversarial training, and randomization. The input transformation [14, 36, 17] methods transform input images to cause obfuscated gradients or project adversarial examples onto clean data manifold. Such methods are not universal, and can be evaded by adaptive attacks [2]. The second category, adversarial training methods such as [24, 35], achieve outstanding robustness by training CNNs from scratch with both clean images and augmented adversarial examples. However, the data augmentation also consumes extensive computing resource compared to regular training. The randomization methods [34, 28]

achieve a balance between robustness and implementation cost by adding randomness to either input or DNN model architectures to mitigate adversarial effects. Due to randomness, the path of generating gradients varies from that of predicting adversarial examples with a high probability, thus the randomization modules mitigate the adversarial effects. Unfortunately, all the above mentioned defense methods have to pay the price of accuracy drop for enhanced robustness 

[29, 37].

In this work, we propose a Robust and Accurate classIfication Network (RAIN) targeting at better robustness and meanwhile good accuracy. Our framework contains two modules: a structured randomization (SRd) module and a detail generation (DG) module. The SRd module contains random pooling and random shifting, which downsamples and shifts input images both in random procedures to destroy the adversarial perturbations. However, the added randomness also deducts the accuracy on clean images as in other defense methods [34, 28]. We compare the images processed by the SRd module with the original ones, and find the processed images are more smooth and tend to lack rich high-frequency details. We further conduct empirical study to show removal of such high-frequency details worsens the accuracy greatly. In view of such findings, we remedy the accuracy drop by recovering the high-frequency details of processed images. This is achieved by a detail generation (DG) module that implements a deep super-resolution model. The pipeline of our proposed RAIN is shown in Figure 1.

We evaluate the RAIN framework on STL10 and ImageNet datasets. Specifically, it achieves robustness of 68.6% under FGSM attack, and concurrently retains accuracy of 93.3% on ImageNet, outperforming the existing randomization-based baselines, and also maintaining the highest accuracy among all defense baselines.

To summarize, we make the following contributions in this work:

  1. We propose a simple and practical framework, termed as RAIN, to help CNNs achieve enhanced robustness and meanwhile maintain high accuracy. The RAIN could be dropped in any given CNNs for enhancing their performance.

  2. We introduce two simple yet effective structured randomization based defense methods. Besides serving as components of the RAIN framework, they are also of independent interest and can be integrated with other defense methods to improve their robustness further.

  3. We reveal the origin of accuracy deduction through our proposed randomization and develop a corresponding solution to remedy the accuracy deduction. The experiments verify the compensation of our solution to the accuracy.

2 Preliminaries on Adversarial Robustness

In this section, we specify the notations, goals and capabilities of the adversary in our defense scenarios. We provide an overview of adversarial attack, and evaluation metrics of adversarial robustness. Suppose we are given a dataset

with samples from categories, where is an RGB color image and is its label. We train a CNN classification model, denoted as . Let

be the loss function for evaluating the model prediction

w.r.t. the label , which is typically a cross-entropy one.

2.1 Attack Models

Given an input image from category , adversarial attack is to craft an adversarial example that causes prediction error of the classification model, i.e., . Here is the additive and imperceptible adversarial perturbation generated by certain adversarial attack methods . We take Fast Gradient Sign Method (FGSM) [16] for illustration, which is an effective one-step adversarial attack method. The adversarial examples are crafted as

(1)

where is the step size that controls the magnitude of the added noise. FGSM is a representative one among gradient-based attack methods [12, 6, 25] which are based on the information of back-propageted gradients on the inputs to craft the perturbations.

For a fair comparison of different defense methods, a perturbation budget is specified so that any adversarial example must satisfy . In this paper, we only consider -norm with the perturbation budget to define the adversary’s capability.

We first consider a white-box ensemble-pattern attack [34] in robustness evaluation. The adversary is aware of the given CNN model and the defense module, as well as the internal gradients of them. The adversary can craft adversarial examples with the back-propagated gradients except for designing any adaptive attack modification. Such an attack scenario is more difficult to defend and thus, many previous defense methods only reported results under simpler vanilla attack scenarios [26, 28]

. Another attack scenario we consider is a score-based black-box attack, which only allows the adversary to access the logits output, and the accessing times will be restricted to a given maximum number.

2.2 Evaluation Metrics of Adversarial Robustness

A widely used metric to evaluate robustness is the prediction accuracy over the adversarial examples generated by certain attack methods [11]

. Here, we only consider samples which have been classified correctly by the given CNN model before getting attacked. Formally, given a CNN model

, we randomly collect a robustness test set, , where each element and . For a certain attack method with the perturbation budget , the robustness is evaluated as

(2)

The above equation calculates the accuracy of a given CNN model on the adversarial examples crafted by a given adversarial attack method with perturbation budget . Note, the given CNN model could also be a defending model for robustness evaluation.

3 Randomization for Robustness

We propose a structured randomization module that consists of two operations, random pooling and random shifting, to defend against adversarial examples. In this section, we provide details of the two operations and reveal the mechanism behind their effectiveness. Lastly, we report robustness evaluation results of the two operations.

3.1 Structured Randomization

We first introduce random pooling and random shifting operations in detail. Both of them can damage the crafting of adversarial perturbations and enhance robustness.

Random Shifting. CNN is known to be almost shift-invariant due to its pooling and convolution layers, which means a small input shifts seldom affect the correct prediction of CNN. The shift-invariance of CNN inspires us to add randomness to CNN through shifting inputs slightly and randomly.

We design a random shifting operation over the input images before feeding them to the given CNN models. The input image is shifted differently for each inference as follows. First two shift values

are randomly sampled from a uniform distribution, where

. Here, the magnitude of is the number of shifted pixels, with sign indicating shifting direction; are the width and height of the input images, and is a predefined proportion and hence . Usually a very small p is enough and we use in our experiments. Then, the random shifting operation shifts the input image by vertically and by horizontally.

Figure 2: The demonstration of random shifting. Left shows the original image; middle and right are random shifted images with and respectively. We use in our module, which causes little impact on predicting the image correctly

Figure 2 demonstrates our random shifting operation. Consequently, randomness is added to CNN, which mitigates the adversarial effect of adversarial examples.

Random Pooling. In addition to random shifting, we also introduce a random pooling operation to improve the robustness further. For any input image, the random pooling operation divides the image into non-overlap patches completely. Next, it randomly picks one pixel from each patch with the uniform probability. The resultant images, denoted by , are downsampled with factor into the size of . For the CNN classifier trained on the original dataset , we upsample into the original size for prediction. In specific, the bicubic upsampling was applied here firstly. (We replace bicubic upsampling with the detail generation module(DG) in section 5.) We use superscripts and to represent the downsampling and upsampling operations respectively, and the upsampled image is denoted as .

Implementation details of the random shifting and random pooling are summarized in Algorithm 1.

1:Input image , CNN model , random Shifting maximum proportion ;
2:
3:
4:
5:
6:Prediction .
7:function Random_Pooling()
8:     for  in and in  do
9:         Randomly initialize
10:               return
11:function Random_Shift()
12:     for  in (0,) and in (0,do
13:         Assert
14:         Randomly initialize
15:         
16:         
17:               return
Algorithm 1 Structured randomization

3.2 Randomization Brings Robustness

As we conjecture, that randomization improves robustness is due to the caused misalignment between the fixed activation path of adversarial perturbations and the random inference path, which is analysed as below.

Given a clean image , when an adversary computes the gradient for crafting the adversarial perturbation, the gradient is generated by , where is the random shifted version of w.r.t. and . The inference path for generating gradients is denoted by (,). Then, for the prediction of the crafted adversarial example , the classifier CNN follows the inference path (,). Thus, the most adversarial perturbation corresponding to the inference path changes to , where represents the random shifted version of with and . The probability that is the same with the most adversarial perturbation is as low as Thus, randomization brings the robustness against adversarial attack.

Similarly in the random pooling operation, the probability that pooling positions selected for generating gradients are the same with those in the adversarial example prediction path is also as low as . The combination of two randomization operations makes the given CNN very robust against gradient-based adversarial attack. We conduct experiments to examine the robustness of this structured randomization module, and also attempt to find the best order of implementing the three steps including random shifting, random pooling and upsampling.

3.3 Robustness Evaluation Experiments

3.3.1 Setup.

The two randomization operations and the upsampling have three possible orders as listed in Table 1. Note, the upsampling has to be placed behind the random pooling. Here we conduct robustness verification experiments on the three orders. For reference, we also conduct robustness verification experiments on vanilla CNN model. The robustness is evaluated under FGSM attack with , and PGD attack with , stepsize , iteration.

We use STL10 [8] and ImageNet [9] datasets here for evaluation. The evaluation metrics of robustness are formulated in Section 2.2. We evaluate robustness by testing the prediction accuracy on a predefined robustness set . The test set contains samples from testing set which are predicted correctly by the given CNN model. The experiments in the following sections also follow the same settings if without any specification. We aim to verify the robustness and find the best order for the three steps,random shifting, random pooling and upsampling.

We use both well-trained Resnet [18] in STL10 and ImageNet datasets for robustness verification experiments. The Resnet on STL10 dataset contains 11 convolutional layers and 1 fully-connected layer. The Resnet50 on ImageNet dataset consists of 5 stages each with a convolution and identity block. The detailed architectures of the used Resnets are given in the appendix.

Accuracy Robustness
STL10 Clean Images FGSM-8/255 PGD-16/255
Original model 1.000 0.090 0.000
P B S 0.810 0.710 0.227
P S B 0.824 0.720 0.287
S P B 0.806 0.712 0.270
ImageNet Clean Images FGSM-8/255 PGD-16/255
Original model 1.000 0.197 0.000
P B S 0.642 0.506 0.121
P S B 0.643 0.573 0.252
S P B 0.639 0.563 0.244
Table 1: Robustness evaluation experiments of three orders of steps. “P”,“B” and “S” stand for random pooling, bicubic upsampling and random shifting respectively. We evaluate the robustness on STL10 and ImageNet Datasets against the FGSM and PGD attacks. All three orders are more robust than the vanilla CNN model. Among them, “P S B” order achieves strongest robustness

3.3.2 Results.

The experiment results are listed in Table 1, from which we can find that all the combinations achieve more than robustness under the FGSM attack. This verifies the effectiveness of randomization in enhancing model robustness. Furthermore, we find the order “P S B” achieves the best robustness performance for more than under the FGSM attack and under the PGD attack. As a reference, the vanilla CNN model only achieves 19.7% and 0.0% robustness performance under the FGSM and PGD attacks. This is a significant improvement on robustness of our proposed structured randomization module compared to the vanilla CNN model.

4 Analyzing Accuracy Drop from Robustness

Even with the proposed structured randomization that enhances robustness against adversarial examples, the accuracy of defense models on clean images is only around on STL10 and around on ImageNet, much worse than accuracy of the vanilla CNN model on clean images. There is a significant drop in accuracy due to the pursuit of model robustness. In this section, we try to dig the root of such drop and then mitigate it.

Figure 3: The original image, the resultant image processed SRd and bicubic upsampling and the difference of them. The second row is the frequency spectrum of the first row images. The centroid of the frequency spectrum is the zero-frequency components. From the difference of frequency spectrum, the processed image is damaged more in the four corners, which are the high-frequency components

To find the reasons of the accuracy drop, we compare the original images with the images processed after the structured randomization module, as shown in Figure 3. The left image has been downsampled, random shifted and upsampled in the structured randomization module. Compared to the right original image, it is obvious that the processed images are more smooth and lack of details. The details of an image are usually the high-frequency components of an image, and the frequency spectrum of Figure 3 verifies that point. Therefore, we hypothesize that the accuracy drop may come from loss of high-frequency components. We then conduct experiments to study the contribution of high-frequency components to the accuracy of a well-trained CNN model, to verify our hypothesis.


Setup. The test datasets and the corresponding trained CNN models we evaluate here are the same as Section 3.3 indicated.

We conduct the experiments in an ablative manner to examine the impact of losing high-frequency details on accuracy. We continually decrease the threshold for removing the high-frequency components from the image, and keep other factors the same, to find the change of accuracy. The following are the steps in details. We first transfer the input image into the frequency domain by FFT:

where is the image in frequency domain with complex values, of the same size as . Then, we remove the high-frequency components from beyond the given threshold , the resultant spectrum is:

(3)

Here is the Euclidean distance, is the indexing in frequency domain and is the indexing of the centroid, which represents the element with 0 frequency. will be the same for the images with same size in frequency domain. Then, we calculate the energy of the resultant frequency spectrum. The energy over a spectrum is computed by:

(4)
Figure 4: On both two datasets, we can see that the remaining energy decreases when more high-freq components are removed (smaller ). Even the energy of the filtered spectrum is controlled to be the same with the origin, removing high-freq components still impairs the test accuracy

Noting that removing the high-frequency components will lead to the loss of corresponding spectral densities, thus the energy over the frequency spectrum also decreases, i.e. . Next, to avoid the influence of energy loss, we uniformly scale up by so that it holds the same energy as the original spectrum’s. Lastly, we could compare the impact on the accuracy of removing different proportion of high-frequency components fairly.


Results. As Figure 4 shows, removing high-frequency components does reduce the accuracy significantly. Along with decreasing the threshold , the accuracy drops very faster in both datasets even after holding the energy to be constant. These results verify that the damaged details after the structured randomization modules is an important factor leading to the accuracy drop.

As the destroyed details is an essential reason causing the drop of accuracy, we are motivated to develop an approach to recover the details and hence obtain a robust and accurate defense framework.

5 RAIN: Robust and Accurate Defense Network

Based on the experiments in the previous sections, we can see that the structured randomization module (SRd) in our RAIN framework can enhance the robustness of CNN models substantially on both STL10 and ImageNet datasets, but at the price of deducted accuracy. Besides, the damaged details caused by the SRd module are proved to be the important factors leading to the drop in accuracy. To remedy such accuracy drop, we introduce a detail generation (DG) module, which is detailed in Section 5.1. We replace bicubic upsampling in Algorithm 1 with the DG module to propose the complete framework of RAIN in this section. Then we conduct experiments to show our proposed framework contributes to better robustness with less sacrificing accuracy, against both white-box and black-box attacks.

5.1 Architectures of RAIN

Detail Generation. In our RAIN framework, to remedy accuracy drop due to pursuit of better robustness with the structured randomization (SRd) module, we apply a detail generation (DG) module. This module is implemented with a Super Resolution (SR) model to generate the details of the images processed by the structured randomization module. SR models are able to upsample low-resolution images and to enhance details [10]

. The deep-learning-based SR models, such as EDSR

[23], achieve impressive performance in Super-Resolution tasks. Therefore, we replace the bicubic upsampling operation with a deep-learning-based SR model, viz., the EDSR.

In order to show the effectiveness of EDSR for generating details, a certain image is processed respectively by the SRd bicubic pipeline and the SRd EDSR pipeline. The resultant images are denoted by and . In Figure. 4(a), we compare the spectrum maps of the two resultant images. Given a spectrum , the spectral density at certain frequency is calculated by

We implement Fourier transform over

and . The spectral density in terms of frequency is plotted in Figure. 4(a). We can see that the spectrum of contains stronger high-freq components in comparison to . Thereby, EDSR enhances high-freq details in the resultant images.

(a)
(b)
Figure 5: Analysis of the effects of RAIN in processing images. (a) Spectrum comparison on the images processed by EDSR and Bicubic. (b) Comparison on feature maps from the Res3a block for the vanilla adversarial images (middle) and the image processed by RAIN (right). The strong responses of some locations have been alleviated. The adversarial images, originally recognized wrongly, are classified correctly after RAIN

Overall Pipeline. The whole pipeline of the RAIN is as follows: for a given well-trained classification model , RAIN first processes the images through the random pooling and the random shifting operations. Then, the well-trained EDSR model upsamples the images back to the regular size and enriches the details. Afterwards, the resultant images are fed to the given CNN classifier . Lastly, to better generate the detail that are useful to , a few fine-tuning operations for the parameters of the SR model are performed to better remedy the drop of accuracy.

5.2 Robustness to White-box Attacks

We conduct experiments to compare our RAIN with other adversarial defense methods under white-box attacks.

5.2.1 Experiment Proposal

We compare our proposed RAIN with other baselines under white-box adversarial attack. The experiment settings and evaluation metrics are the same as the section 3.3 indicates. The EDSR model contains residual blocks, filters and are trained in DIV2K dataset [1]. The robustness are evaluated under FGSM attack with , and PGD attack with stepsize iteration

. We choose four recent baselines as the benchmark of our evaluation. Two of the baselines are randomization-based defense methods, Random-padding

[34] and Pixel deflection[28]. The rest two are adversarial-training based defense methods, adversarial training [24] and feature denoising [35].

Accuracy Robustness
STL10 Clean Images FGSM-8/255 PGD-16/255
Pixel Deflection [28] 0.883 0.286 0.065
Random Padding Resizing [34] 0.907 0.576 0.070
Adversarial Training [24] 0.705 0.592 0.649
Feature Denoising [35] 0.696 0.631 0.668
RAIN 0.929 0.745 0.237
ImageNet Clean Images FGSM-8/255 PGD-16/255
Pixel Deflection [28] 0.858 0.406 0.117
Random Padding Resizing [34] 0.928 0.644 0.154
Adversarial Training [24] 0.623 0.620 0.417
Feature Denoising [35] 0.653 0.648 0.455
RAIN 0.933 0.686 0.273
Table 2: Robustness Evaluation of RAIN and baselines under white-box attacks on STL10 and ImageNet datasets. The white-box attacks are end-to-end FGSM attack with and PGD attack with

5.2.2 Experiment Results

Table 2 shows the result of accuracy and robustness under white-box attacks. Our proposed RAIN outperforms the other four baselines both in accuracy (92.9%,93.3%) and robustness (74.5%,68.6%) under the FGSM attack in both datasets. Although the two adversarial-training-based baselines achieves best robustness under the PGD attack, they have much worse accuracy. More importantly, both adversarial-training-based baselines are trained from scratch with adversarial examples crafted by the PGD attack with same , which makes them more robust against iterative attack methods. Figure 5 show the difference of feature map with and without RAIN for a given CNN model. We can see that the RAIN mitigate the adversarial effect of malicious examples.

5.3 Robustness to Black-box Attacks

We then compare our proposed RAIN with the baselines under black-box adversarial attacks to test its robustness.

5.3.1 Experiment Proposal

The experiments follow the same benchmark as the white-box attack experiments in section 5.2 apart from the adversarial attack methods. We select three black-box adversarial attack mehtod, ZOO [7] and NES [19] to evaluate our defense approach RAIN. The perturbation budget for all experiments under black-box adversarial attack.

Accuracy Robustness
STL10 Clean Images ZOO-8/255 NES-8/255
Pixel Deflection [28] 0.883 0.679 0.650
Random Padding Resizing [34] 0.907 0.854 0.873
Adversarial Training [24] 0.705 0.705 0.663
Feature Denoising [35] 0.696 0.621 0.594
RAIN 0.929 0.912 0.871
ImageNet Clean Images ZOO-8/255 NES-8/255
Pixel Deflection [28] 0.858 0.846 0.841
Random Padding Resizing [34] 0.928 0.867 0.881
Adversarial Training [24] 0.623 0.620 0.611
Feature Denoising [35] 0.653 0.663 0.641
RAIN 0.933 0.885 0.882
Table 3: Robustness Evaluation of our proposed RAIN and baselines under black-box attacks on STL10 and ImageNet datasets. All the black-box attacks are with

5.3.2 Experiment Results

Table 3 shows the result of accuracy and robustness under black-box attacks. Similar to the experiment result in white-box attack. Our propose RAIN achieved both highest robustness(91.2%,88.5%) among the four baselines under FGSM attack in both datasets. The experiment results verify that our proposed RAIN are robust under different adversarial attack methods and different datasets. Last but not least, our proposed RAIN does little harm to the accuracy with better robustness.

5.3.3 Discussion and Future Work

We can see that the DG module increases the spectral density in the high-frequency part, which is the procedure of generating details. Although the SR model in DG module shows good performance in generating details, the DG module is not limited by SR model. Other generative models such as GAN [22] and image enhancement [15] would also be suitable for the detail generating. We will keep on investigating different solution for remedying accuracy in the future.

6 Related Work

6.0.1 Adversarial Attack

The investigation of adversarial examples was initiated by [31], which shows well-crafted visually imperceptible perturbations could cause prediction error of well-trained CNNs. Later, Fast Gradient Sign Method (FGSM) [16] was developed to compute such adversarial perturbations by conducting gradient ascent on the original input. Following FGSM, several powerful iterative adversarial attack methods, including DeepFool [25], PGD [24], MI-FGSM [12] and C&W attack [6] were developed. All of them need to access the internal back-propagated gradients on the original images to generate attacks and thus are called the white-box attack methods. On the contrary, the gradients are not available in the black-box settting. To generate attacking perturbations, ZOO [7] applies symmetric difference quotient [21]

to estimate the back-propagated gradient of each pixel according to the output change to the queries. Though achieving comparable attack effect as many white-box methods, it requires an excessive number of queries for gradients estimation. Recently, many researchers focus on improving the black-box attack efficiency

[33, 20, 13]. For instance, a natural evolution strategy (NES) was proposed by [21, 19] to estimate the back-propagated gradient on the original images.

6.0.2 Adversarial Defense

Randomization was introduced by [28] at inference to obfuscate back-propagated gradients, by randomly sampling and replacing the pixels with their neighbors. Besides, random resizing and padding layers [34] are also able to interrupt the gradient computation and thus impair the attack methods. Such methods perform well against both white-box attack and black-box attack methods [11]. The randomization-based defense methods do not require training from scratch and can be applied to robustify any well-trained CNN models directly and quickly. This is one of their significant advantages over other defense methods. Adversarial training from scratch could improve more in robustness against adversarial examples [24]. In addition to adversarial training, a rencent work [35] proposes a feature denoising filter to improve the robustness further, by non-local means [5] to denoise the perturbated features. Although the adversarial training methods offer stronger robustness, they require much longer training time and still suffer accuracy drop.

Image super-resolution (SR) issue is also explored to improve robustness of CNN models but the relevant studies are very few. A recent work [26] uses SR to upsample the adversarial examples into the natural image manifold and applies wavelet denoising on upsampled examples for defense. However, the authors only tested their method in black-box attack scenarios, where the SR model is unknown to the adversary. Our experiments show their method is fragile in the ensemble-pattern white-box attack scenario, where the adversary is aware of the defense module and the gradients. More importantly, their released codes show that they only upsampled the adversarial examples and did not resize the examples back to the original size. In contrast, the clean images were not upsampled by SR. Such leaked prior information makes the accuracy evaluation unfair. Different from that work, our RAIN processes adversarial examples the same as clean images. we apply SR to recover the details of the input images to remedy accuracy drop, and we evaluate our RAIN also in the white-box scenarios.

The performance decline on clean images is the cost of enhanced robustness. The trade-off between adversarial robustness and accuracy on clean images was reported in [29]. Consecutively, a theoretical explanation was given in [32], claiming that the trade-off exists due to the features learned by robust models and accurate models are fundamentally different, whereas they also did not propose a solution to improving the trade-off either.

7 Conclusions

In this paper, we enhance the robustness and at the same time maintain the accuracy of given CNNs by proposing a RAIN framework. The RAIN contains a structured randomization module and a detail generation module. The structured randomization module downsamples and shifts the input images randomly for improved robustness. We then investigate the root of performance drop after applying the randomization module through obersavation, experiments and analysis. We find the detail information is damaged in the randomization progress which leads to accuracy drop. Inspired by such findings, we devise a deep Super Resolution model as the detail generaiton module to upsample and recover the details loss during the structured randomization module, thus to remedy the accuracy deduction. Last but not least, the evaluation experiments conducted on the STL10 and ImageNet datasets confirm the robustness improvement and maintained accuracy of our proposed framework.

References

  • [1] E. Agustsson and R. Timofte (2017) Ntire 2017 challenge on single image super-resolution: dataset and study. In

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops

    ,
    pp. 126–135. Cited by: §5.2.1.
  • [2] A. Athalye, N. Carlini, and D. Wagner (2018) Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. arXiv preprint arXiv:1802.00420. Cited by: §1.
  • [3] B. Biggio, I. Corona, D. Maiorca, B. Nelson, N. Šrndić, P. Laskov, G. Giacinto, and F. Roli (2013)

    Evasion attacks against machine learning at test time

    .
    In Joint European conference on machine learning and knowledge discovery in databases, pp. 387–402. Cited by: §1.
  • [4] L. Bottou, Y. Bengio, and Y. Le Cun (1997)

    Global training of document processing systems using graph transformer networks

    .
    In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 489–494. Cited by: §1.
  • [5] A. Buades, B. Coll, and J. Morel (2005) A non-local algorithm for image denoising. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), Vol. 2, pp. 60–65. Cited by: §6.0.2.
  • [6] N. Carlini and D. Wagner (2017) Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy (SP), pp. 39–57. Cited by: §2.1, §6.0.1.
  • [7] P. Chen, H. Zhang, Y. Sharma, J. Yi, and C. Hsieh (2017) Zoo: zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In

    Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security

    ,
    pp. 15–26. Cited by: §5.3.1, §6.0.1.
  • [8] A. Coates, A. Ng, and H. Lee (2011) An analysis of single-layer networks in unsupervised feature learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics, pp. 215–223. Cited by: §3.3.1.
  • [9] J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-Fei (2009) Imagenet: a large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255. Cited by: §3.3.1.
  • [10] C. Dong, C. C. Loy, K. He, and X. Tang (2015) Image super-resolution using deep convolutional networks. IEEE transactions on pattern analysis and machine intelligence 38 (2), pp. 295–307. Cited by: §5.1.
  • [11] Y. Dong, Q. Fu, X. Yang, T. Pang, H. Su, Z. Xiao, and J. Zhu (2019) Benchmarking adversarial robustness. arXiv preprint arXiv:1912.11852. Cited by: §2.2, §6.0.2.
  • [12] Y. Dong, F. Liao, T. Pang, H. Su, J. Zhu, X. Hu, and J. Li (2018) Boosting adversarial attacks with momentum. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 9185–9193. Cited by: §2.1, §6.0.1.
  • [13] J. Du, H. Zhang, J. T. Zhou, Y. Yang, and J. Feng (2019) Query-efficient meta attack to deep neural networks. arXiv preprint arXiv:1906.02398. Cited by: §6.0.1.
  • [14] G. K. Dziugaite, Z. Ghahramani, and D. M. Roy (2016) A study of the effect of jpg compression on adversarial images. arXiv preprint arXiv:1608.00853. Cited by: §1.
  • [15] M. Gharbi, J. Chen, J. T. Barron, S. W. Hasinoff, and F. Durand (2017) Deep bilateral learning for real-time image enhancement. ACM Transactions on Graphics (TOG) 36 (4), pp. 1–12. Cited by: §5.3.3.
  • [16] I. J. Goodfellow, J. Shlens, and C. Szegedy (2014) Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572. Cited by: §2.1, §6.0.1.
  • [17] C. Guo, M. Rana, M. Cisse, and L. Van Der Maaten (2017) Countering adversarial images using input transformations. arXiv preprint arXiv:1711.00117. Cited by: §1.
  • [18] K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778. Cited by: §3.3.1.
  • [19] A. Ilyas, L. Engstrom, A. Athalye, and J. Lin (2018) Black-box adversarial attacks with limited queries and information. arXiv preprint arXiv:1804.08598. Cited by: §5.3.1, §6.0.1.
  • [20] A. Ilyas, L. Engstrom, and A. Madry (2018) Prior convictions: black-box adversarial attacks with bandits and priors. arXiv preprint arXiv:1807.07978. Cited by: §6.0.1.
  • [21] P. D. Lax and M. S. Terrell (2014) Calculus with applications. Springer. Cited by: §6.0.1.
  • [22] C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, et al. (2017) Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4681–4690. Cited by: §5.3.3.
  • [23] B. Lim, S. Son, H. Kim, S. Nah, and K. Mu Lee (2017) Enhanced deep residual networks for single image super-resolution. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 136–144. Cited by: §5.1.
  • [24] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu (2017) Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083. Cited by: §1, §5.2.1, Table 2, Table 3, §6.0.1, §6.0.2.
  • [25] S. Moosavi-Dezfooli, A. Fawzi, and P. Frossard (2016) Deepfool: a simple and accurate method to fool deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2574–2582. Cited by: §2.1, §6.0.1.
  • [26] A. Mustafa, S. H. Khan, M. Hayat, J. Shen, and L. Shao (2019) Image super-resolution as a defense against adversarial attacks. IEEE Transactions on Image Processing 29, pp. 1711–1724. Cited by: §2.1, §6.0.2.
  • [27] N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, and A. Swami (2016) Practical black-box attacks against deep learning systems using adversarial examples. arXiv preprint arXiv:1602.02697 1 (2), pp. 3. Cited by: §1.
  • [28] A. Prakash, N. Moran, S. Garber, A. DiLillo, and J. Storer (2018) Deflecting adversarial attacks with pixel deflection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8571–8580. Cited by: §1, §1, §2.1, §5.2.1, Table 2, Table 3, §6.0.2.
  • [29] D. Su, H. Zhang, H. Chen, J. Yi, P. Chen, and Y. Gao (2018) Is robustness the cost of accuracy?–a comprehensive study on the robustness of 18 deep image classification models. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 631–648. Cited by: §1, §6.0.2.
  • [30] W. Sultani, C. Chen, and M. Shah (2018)

    Real-world anomaly detection in surveillance videos

    .
    In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6479–6488. Cited by: §1.
  • [31] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus (2013) Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199. Cited by: §1, §6.0.1.
  • [32] D. Tsipras, S. Santurkar, L. Engstrom, A. Turner, and A. Madry (2018)

    Robustness may be at odds with accuracy

    .
    arXiv preprint arXiv:1805.12152. Cited by: §6.0.2.
  • [33] C. Tu, P. Ting, P. Chen, S. Liu, H. Zhang, J. Yi, C. Hsieh, and S. Cheng (2019)

    Autozoom: autoencoder-based zeroth order optimization method for attacking black-box neural networks

    .
    In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, pp. 742–749. Cited by: §6.0.1.
  • [34] C. Xie, J. Wang, Z. Zhang, Z. Ren, and A. Yuille (2017) Mitigating adversarial effects through randomization. arXiv preprint arXiv:1711.01991. Cited by: §1, §1, §2.1, §5.2.1, Table 2, Table 3, §6.0.2.
  • [35] C. Xie, Y. Wu, L. v. d. Maaten, A. L. Yuille, and K. He (2019) Feature denoising for improving adversarial robustness. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 501–509. Cited by: §1, §5.2.1, Table 2, Table 3, §6.0.2.
  • [36] W. Xu, D. Evans, and Y. Qi (2017) Feature squeezing: detecting adversarial examples in deep neural networks. arXiv preprint arXiv:1704.01155. Cited by: §1.
  • [37] H. Zhang, Y. Yu, J. Jiao, E. P. Xing, L. E. Ghaoui, and M. I. Jordan (2019) Theoretically principled trade-off between robustness and accuracy. arXiv preprint arXiv:1901.08573. Cited by: §1.