Enhancing Intrinsic Adversarial Robustness via Feature Pyramid Decoder

05/06/2020 ∙ by Guanlin Li, et al. ∙ Nanyang Technological University 3

Whereas adversarial training is employed as the main defence strategy against specific adversarial samples, it has limited generalization capability and incurs excessive time complexity. In this paper, we propose an attack-agnostic defence framework to enhance the intrinsic robustness of neural networks, without jeopardizing the ability of generalizing clean samples. Our Feature Pyramid Decoder (FPD) framework applies to all block-based convolutional neural networks (CNNs). It implants denoising and image restoration modules into a targeted CNN, and it also constraints the Lipschitz constant of the classification layer. Moreover, we propose a two-phase strategy to train the FPD-enhanced CNN, utilizing ϵ-neighbourhood noisy images with multi-task and self-supervised learning. Evaluated against a variety of white-box and black-box attacks, we demonstrate that FPD-enhanced CNNs gain sufficient robustness against general adversarial samples on MNIST, SVHN and CALTECH. In addition, if we further conduct adversarial training, the FPD-enhanced CNNs perform better than their non-enhanced versions.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

page 3

page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The ever-growing ability of deep learning has found numerous applications mainly in image classification, object detection, and natural language processing 

[he_deep_2016, lin_feature_2017, vaswani_attention_2017]. While deep learning has brought great convenience to our lives, its weakness is also catching researchers’ attention. Recently, researchers have started to pay more attention to investigating the weakness of neural networks, especially in its application to image classification. Since the seminal work by [szegedy_intriguing_2013, nguyen_deep_2015], many follow-up works have demonstrated a great variety of methods in generating adversarial samples: though easily distinguishable by human eyes, they are often misclassified by neural networks. More specifically, most convolutional layers are very sensitive to perturbations brought by adversarial samples (e.g., [che_adversarial_2019, xu_interpreting_2019]), resulting in misclassifications. These so-called adversarial attacks may adopt either white-box or black-box approaches, depending on the knowledge of the target network, and they mostly use gradient-based methods [goodfellow_explaining_2014, madry_towards_2017, tramer_ensemble_2018] or score-based methods [carlini_towards_2017] to generate adversarial samples.

To thwart these attacks, many defence methods have been proposed. Most of them use adversarial training to increase the network robustness, e.g., [athalye_obfuscated_2018, kannan_adversarial_2018]. However, as training often targets a specific attack, the resulting defense method can hardly be generalized, as hinted in [tramer_ensemble_2018]

. In order to defend against various attacks, a large amount and variety of adversarial samples are required to retrain the classifier, leading to a high time-complexity. In the meantime, little attention has been given to the direct design of robust frameworks in an

attack-agnostic manner, except a few touches on denoising [song_pixeldefend:_2018, xie_feature_2019] and obfuscating gradients [guo_countering_2018, song_pixeldefend:_2018] that aim to directly enhance a target network in order to cope with any potential attacks.

To enhance the intrinsic robustness of neural networks, we propose an attack-agnostic defence framework, applicable to enhance all types of block-based CNNs. We aim to thwart both white-box and black-box attacks without crafting any specific adversarial attacks. Our Feature Pyramid Decoder (FPD) framework implants a target CNN with both denoising and image restoration modules to filter an input image at multiple levels; it also deploys a Lipschitz Constant Constraint at the classification layer to limit the output variation in the face of attack perturbation. In order to train an FPD-enhanced CNN, we propose a two-phase strategy; it utilizes -neighbourhood noisy images to drive multi-task and self-supervised learning.

Figure 1: The structure of the block-based CNN, enhanced with the proposed framework named FPD: it consists of the Lipschitz constant constrained classification layer ; the front denoising module , the image restoration module , a middle denoising layer and the back denoising module . -neighbourhood noisy samples and original samples are used to train the FPD. Orange, blue and green blocks represent the original components of the CNN, the proposed components that implanted to the CNN, the modified components of the CNN, respectively.

As shown in Figure 1, FPD employs a front denoising module, an image restoration module, a middle denoising layer, and a back denoising module. Both front and back denoising modules consist of the original CNN blocks interleaved with inner denoising layers, and the inner denoising layers are empirically implanted only to the shallow blocks of the CNN. Enabled by the image restoration module, the whole enhanced CNN exhibits a multi-scale pyramid structure. The multi-task learning concentrates on improving both the quality of the regenerate images and the performance of final classification. Aided by the supervision target , the enhanced CNN could be trained to denoise images and abstract the features from the denoised images. In summary, we make the following major contributions:

  • [noitemsep]

  • Through a series of exploration experiments, we propose a novel defence framework. Our FPD framework aims to enhance the intrinsic robustness of all types of block-based CNN.

  • We propose a two-phase strategy for strategically training the enhanced CNN, utilizing -neighbourhood noisy images with both self-supervised and multi-task learning.

  • We validate our framework performance on both MNIST, SVHN and CALTECH datasets in defending against a variety of white-box and black-box attacks, achieving promising results. Moreover, under adversarial training, an enhanced CNN is much more robust than the non-enhanced version.

Owing to unavoidable limitations of evaluating robustness, we release our network in github33footnotemark: 3 to invite researchers to conduct extended evaluations.

(a) inner denoising layer with bottleneck
(b) inner denoising layer without bottleneck
(c) middle denoising layer without bottleneck
Figure 2: Three types of denoising layers which we have experimented on. (a) an inner denoising layer that linking two residual blocks with bottleneck, (b) an inner denoising layer that linking two residual blocks without bottleneck and (c) a middle denoising layer that denoising the input of the last part without bottleneck.

2 Related Work

Adversarial attack and training

White-box attacks are typically constructed based on the gradients of the target network such as Fast-Gradient Sign Method (FGSM), Projected Gradient Descent (PGD) and Basic Iterative Method (BIM) [goodfellow_explaining_2014, madry_towards_2017, kurakin_adversarial_2016]. Some approaches focus on optimizing attack objective function like Carlini & Wagner attack (C&W) and DeepFool [carlini_adversarial_2017, moosavi-dezfooli_deepfool:_2016], while others utilize the decision boundary to attack the network [brendel_decision-based_2018, chen_boundary_2019]. Black-box attacks mainly rely on transfer-attack. Attackers substitute the target network with a network, trained with the same dataset. Subsequently, white-box attacks are applied to the substituted network for generating the adversarial samples.

Adversarial training, proposed by [goodfellow_explaining_2014, madry_towards_2017, tramer_ensemble_2018, yan_deep_2018], is an approach to improve the robustness of the target network. Normally, it augments the adversarial samples to the training set in the process of retraining phase. Adversarial training could achieve good results on defending against white-box and black-box attacks. However, it requires to involve a sufficient amount and variety of adversarial samples, leading to a high time-complexity.

Denoising

Most denoising methods improve the intrinsic robustness of the target network, contributed by obfuscating gradients: non-differentiable operations, gradient vanishing (exploding). Various non-differentiable operations are proposed such as image quilting, total variance minimization and quantization 

[efros_image_2001, rudin_nonlinear_1992, guo_countering_2018]. Pixel denoising approach utilizes gradient vanishing (exploding) to thwart the attack, widely developed based on Generative-Adversarial-Network (GAN) such as [meng_magnet:_2017]. However, the aforementioned approaches cannot thwart structure-replaced white-box attacks easily [athalye_obfuscated_2018]. Attackers could still conduct attacks by approximating gradients of their non-differentiable computations. Instead of relying on obfuscating gradients, our differentiable FPD can circumvent the structure-replaced white-box attack.

Our proposal is partially related to [xie_feature_2019], as the denoising layers in our FPD are inspired by their feature denoising approach. Nevertheless, different from [xie_feature_2019], the principle behind our FPD is to improve the intrinsic robustness, regardless of conducting adversarial training or not. Consequently, FPD includes not only two denoising modules, but also image restoration module and the Lipschitz constant constrained classification layer as well, establishing a multi-task and self-supervised training environment. Moreover, we employ denoising layers in a much more effective way: instead of implanting them to all blocks of the enhanced CNN, only shallow blocks are enhanced for maintaining high-level abstract semantic information. We will compare the performance between FPD-enhanced CNN and the CNN enhanced by [xie_feature_2019] in Section 4.1.

3 Feature Pyramid Decoder

In this section, we introduce each component of our Feature Pyramid Decoder, shown in Figure 1. Firstly, we introduce the structure of the front denoising module and back denoising module . Next, the structure of the image restoration module is depicted. Then, we modify the classification layer of the CNN by applying Lipschitz constant constraint (). Finally, our two-phase training strategy is introduced, utilizing -neighbourhood noisy images with multi-task and self-supervised learning.

3.1 Front and Back Denoising Module

A denoising module is a CNN implanted by certain inner denoising layers. Specifically, a group of inner denoising layers is only implanted into the shallow blocks of a block-based CNN. Consequently, the shallow features are processed to alleviate noise, whereas the deep features are directly decoded, helping to keep the abstract semantic information. Meanwhile, we employ a residual connection between denoised features and original features. In the light of it, most of original features could be kept and it helps to amend gradient update.

Moreover, we modify non-local means algorithm [buades_non-local_2005] by replacing the Gaussian filtering operator with a dot product operator. It could be regarded as a self-attention mechanism interpreting the relationship between pixels. Compared with the Gaussian filtering operator, the dot product operator helps improve the adversarial robustness [xie_feature_2019]. Meanwhile, as the dot product operator does not involve extra parameters, it contributes to relatively lower computational complexity. We explore two inner denoising structures shown in Figure 1(a) and Figure 1(b). The corresponding performance comparison is conducted in Section 4.1. In our framework, the parameters of and are shared for shrinking the network size. The motivation of exploiting weight sharing mechanism has been explained in [kopuklu2019convolutional]: weight sharing mechanism not only reduces Memory Access Cost (MAC) but also provides more gradient updates to the reused layers from multiple parts of the network, leading to more diverse feature representations and helping to generalize better.

3.2 Image Restoration Module

To build the restoration module, we firstly upsample feature maps from each block of (except the first block) for the image dimension consistency and then the upsampled feature maps are fused. Finally, a group of the transposed convolutions transforms the fused feature maps into an image that has the same resolution as the input. On the other hand, we especially find that particular noise is brought by the . To minimize its influence, another middle denoising layer is applied to the , depicted in Figure 1(c). Contributed by the image restoration module and the denoising module, it helps establish a two-phase training strategy.

3.3 Lipschitz Constant Constrained Classification

The influence of employing Lipschitz constant on defending against the adversarial samples have been analyzed in [finlay_improved_2018, huster_limitations_2018]. As stated in our following Theorem 1

, the network could be sensitive to some perturbations if Softmax is directly used as the last layer’s activation function. However, no network has ever adopted another output-layer activation function before Softmax in defending against adversarial samples so far.

Figure 3: Implementation details of Lipschitz constant constrained (). It is implemented by involving a squeezing activation function to the output of a fully connected layer, i.e. Tanh.
Figure 4: Implementation details of two-phase training strategy utilizing self-supervised and multi-task learning: the enhanced CNN FPD, in which refers to the image restoration module; stands for the front denoising module; stands for the back denoising module; refers to the modified classification layer; are the samples in the -neighbourhood of each image. The first phase training is optimized by loss. If loss , only the parameters of and is updated. Once the loss reaches the , the cross-entropy (CE) loss with loss jointly trains the enhanced CNN. Then, the second phase train the enhanced CNN further, jointly optimized by CE loss and loss.
Theorem 1 (the constraint on Lipschitz constant for fully-connected network).

Let be a -way--layer-fully-connected network, be the -th component of the network output given input , be the weight matrix of the -th layer of the network, and

be a bias matrix of the same layer. Given a noise vector

, we can bound the variation component-wisely from above by:

where is the -th component of the input to Softmax given input . Given Softmax function as the activation func-
tion of the output layer, we denote the activation function of earlier layers by , ’s
Lipschitz constant by , and let .

0:    Clean images , regenerate image , noisy image , label , predict label , optimizer , updated parameters , learning rate , weights decay , seed

, other hyperparameters

, the enhanced CNN FPD (including the image restoration module , the front denoising module , the back denoising module and the modified classification layer

), loss functions

and cross-entropy CE, random sampler RS, threshold

, epoch

0:    FPD
1:  Normalize each pixel of into
2:  for  to  do
3:      RS
4:     UPDATE
5:     CLIP BETWEEN
6:     
7:      FPD
8:     
9:     if  then
10:        UPDATE PARAMETERS:
11:     else
12:         CE
13:        UPDATE PARAMETERS:FPD
14:         FPD
15:        
16:         CE
17:        UPDATE PARAMETERS:FPD
18:     end if
19:  end for
20:  return  FPD
Algorithm 1 Detail Training Procedures

We postpone the proof of Theorem 1 to the supplementary material. The theorem clearly shows that and may have more prominent influence than on the variation of the output , when we have achieved by using regularization to restrict the weights getting close to zero. Therefore, we want to restrict and by utilizing a squeezing function with a small Lipschitz constant before Softmax in the output layer. Consequently, this reduces to , poten-tially leading to a smaller . Therefore, the output of could be more stable in the face of attack perturbation.

To thwart various attacks, we let as our squeezing function, shown in Figure 3

. Moreover, we empirically replace all the activation functions from ReLU to ELU; this leads to a smoother classification boundary, thus adapting to more complex distributions.

width=0.8center Inner Denoising Layer Implanted Position Selection Accuracy WhiteBox BlackBox Average Shallow 1.67% 31.10% 16.39% Deep 2.08% 27.02% 14.55% Denoising Approaches Average 11.04% 15.99% 13.51% Flip 1.22% 17.34% 9.28% Mid 0.32% 53.77% 27.05% Mid + Inner 7.44% 42.41% 24.93% Ablation Study 1.67% 31.10% 16.34% 7.44% 42.41% 24.93% 25.55% 62.72% 44.14% Activation Functions Selection 0.29% 49.28% 24.79% 0.28% 61.24% 30.76% 0.25% 69.11% 34.68% Bottleneck Selection 22.21% 46.65% 34.43% 25.55% 62.72% 44.14% No. Inner Denoising Layers Selection 0.04% 13.26% 6.65% 1.97% 15.90% 8.94% Training Strategy Selection 8.60% 51.08% 29.84% 25.55% 62.72% 44.14% ResNet-101 Enhanced by [xie_feature_2019] 5.72% 62.39% 32.56%

Table 1: Overall results of the exploration experiments with ResNet-101 on MNIST.

3.4 Training Strategy

We carefully devise our training strategy and involve uniformly sampled random noise to the clean images for further improving the enhanced CNN. Let us define the enhanced CNN FPD, in which refers to the image restoration module; stands for the front denoising module; stands for the back denoising module; refers to the modified classification layer.

To further improve the denoising and generalization capability, we suppose that the samples in the -neighbourhood of each image constitute adversarial samples candidate sets. We add uniformly sampled random noise to the clean images by using a sampler. It is impossible to use all samples in candidate sets, but the enhanced CNN will have more stable performance on classifying images in a smaller -neighbourhood ( ) after training on noisy images. The detail training procedures are described in Algorithm 1.

We propose the two-phase training to drive the self-supervised and multi-task learning for jointly optimizing the enhanced CNN. It helps the enhanced CNN to learn how to denoise images and abstract features from them with low cost and helps the enhanced CNN to learn a much more accurate mapping between images and labels. As shown in Figure 4, the first phase mainly focuses on regenerating images, optimized by loss. To guarantee the quality of used in the later training procedures, we set a threshold . If loss , only the parameters of and is updated for generating the higher quality . Once the loss reaches the , the cross-entropy (CE) loss with loss jointly trains the enhanced CNN. Then, the second phase focus on using the good quality to train the enhanced CNN further, jointly optimized by CE loss and loss.

4 Experiments

In this section, we firstly investigate the best framework structure through the exploration study. Moreover, we compare with the most related work [xie_feature_2019] as well. In the comparison experiments, we focus on comparing the robustness between the enhanced CNN and the original one, conducting adversarial training and normal training, respectively. Owing to the unavoidable limitations of evaluating robustness, we apply various attacks to evaluate our performance. However, we cannot avoid that more effective attacks exist and the trained network will be released for future evaluation.

We employ MNIST, the Street View House Numbers (SVHN), CALTECH-101 and CALTECH-256 datasets in the following experiments. MNIST consists of a training set of 60,000 samples and a testing dataset of 10,000 samples. SVHN is a real-world colored digits image dataset. We use one of its format which includes 73,257 MNIST-like 32-by-32 images centered around a single character for training and 10,000 images for testing. For both MNIST and SVHN, we resize them to image size 64. Besides, we repeat the channel three times on MNIST for network consistency. For both CALTECH-101 and CALTECH-256, we randomly choose 866 and 1,422 images as test images resized into 224-by-224, respectively. We normalize image pixel value into . ResNet-101, ResNet-50 [he_deep_2016] as well as ResNeXt-50 [xie_aggregated_2017]

are enhanced in the following experiments. We use Pytorch to implement the whole experiments.

4.1 Exploration Experiments

In this section, we conduct the exploration experiments of the FPD-enhanced CNN which is based on ResNet-101 on MNIST. In Table 1, we use -PGD attack with parameters: = 0.3, step = 40, step size = 0.01 for both white-box and black-box attacks. Under the black-box condition, we separately train a simple three layers fully-connected network as the substitute network [papernot_practical_2017] for each network.

Inner Denoising Layers Implanted Positions Selection

We firstly explore the position to implant the inner denoising layers. In Table 1, ’Shallow’ means that the denoising layers are implanted into the first two residual blocks. Likewise, ’Deep’ means that the layers are implanted into the third and fourth residual blocks. We observe that the ’Shallow’ outperforms ’Deep’ on average. It may be contributed by the high-level abstract semantic information generated from the directly decoded deep features. In the following experiments, we always implant the inner denoising layers to the shallower blocks.

Denoising Approaches Selection

Next, we explore the best denoising operation. In Table 1, no denoising layers are implanted in both the front and back denoising modules in ’Average’, ’Flip’ and ’Mid’ denoising approaches. In these three approaches, we only focus on cleaning the before passing to . Specifically, ’Average’: and are averaged; ’Flip’: are flipped; ’Mid’: the noise in are alleviated by the middle denoising layer as depicted in Figure 1(c). Finally, ’Mid + Inner’ means that we implant the two inner denoising layers to both the front and back denoising modules respectively. Meanwhile, the middle denoising layer is also utilized. Distinctly, ’Mid + Inner’ is all-sided robust among them to defend against both the black-box and white-box attacks, attributing to the stronger denoising capability.

Ablation Study

To validate the effectiveness of , we perform the ablation experiments on investigating the effectiveness of each module. As shown in Table 1, performs far better than both and in thwarting both white-box and black-box attacks. This overall robustness is owing to the increase of data diversity and the supervision signal brought by . Furthermore, can further clean the to enhance the robustness in defending against the well-produced perturbations.

Activation Functions Selection

We explore the activation functions selection. Table 1 indicates that ELU activation function outperforms ReLU. Furthermore, as shown in Figure 3, ELU with Tanh achieves better performance than ELU one with 3.92%. It demonstrates that ELU with Tanh is the suggested activation function selection.

max width=1.0 Network Name FGSM PGD C&W FGSM PGD C&W DeepFool Average Acc T(m) Acc T(m) Acc T(m) Acc T(m) Acc T(m) Acc T(m) Acc T(m) Acc T(m) 4% 0.85 0% 46.42 0% 56.75 94% 0.95 82% 53.25 15% 1183.67 6% 364.13 27.57% 243.72 43% 0.95 0% 66.43 36.63% 39.57 100% 0.95 80% 64.22 74% 1177.27 6% 365.17 48.52% 244.94 92% 1.85 76% 68.37 89.88% 42.92 98% 1.83 92% 68.32 93% 1193.5 9% 344.38 78.56% 245.88 31% 1.5 0% 72.3 88% 212 98% 1.73 95% 98.58 95% 1134.38 9.62% 911.9 59.51% 347.48 42% 0.95 0.87% 119 46.12% 181.6 97% 2.25 95% 113.97 97% 1047.83 33.16% 907.3 58.74% 338.99 87% 1.6 64.03% 115.85 78% 160 100% 3.17 97% 108.5 100% 1043.25 11.87% 912.37 76.84% 334.96

Table 2: Robustness evaluation results (Accuracy %, Attack Time (min)) in thwarting the white-box attacks with ResNet-101 on MNIST.

Inner Denoising Layers Selection

We also investigate the optimal number of the inner denoising layers and whether to use the bottleneck in these inner layers. In Table 1, : inner denoising layers with the bottleneck as depicted in Figure 1(a) are implanted to each denoise module and respectively. Meanwhile, the middle denoising layer is used as depicted above; is similar to except that no middle denoising layer is involved in the framework. means that the bottleneck is not used in the inner denoising layers as depicted in Figure 1(b). We observe that the bottleneck reduces the performance around 10%. Moreover, although outperforms , the enhancement is not worthy if we consider the time complexity brought by the additional denoising layers. Therefore, we use as our proposed framework in the following experiments.

Training Strategy Selection

We further demonstrate the efficacy of our two-phase training strategy as depicts in Figure 4. We mainly compare with one-phase training strategy .i.e the first training phase (describes in Section 3.4). Results show that could achieve higher performance than with 14.3%.

Comparison with the Related Work

As mentioned in Section 2, the denoising approach proposed in [xie_feature_2019] is similar to our denoising layers in FPD. Therefore, we conduct a comparison experiment with [xie_feature_2019] as well. In Table 1, represents the enhanced CNN by [xie_feature_2019]. We observe that our outperforms . Especially, the performance of thwarting the white-box attack is about 20% higher.

4.2 Comparison Experiments

We conduct a series of comparison experiments***We use adversarial-robustness-toolbox [nicolae_adversarial_2018], a tool for testing the network’s robustness of defending against various attacks. to further evaluate FPD-enhanced CNN performance on MNIST, SVHN, CALTECH-101 and CALTECH-256.

Notation and Implementation Details

Firstly, let us define the following notations for accurate description: represents the enhanced CNN; is the original CNN; and is adversarial trained by -PGD (on MNIST: =0.3, step=100 and step length=0.01; on SVHN: =8/256.0, step=40 and step length=2/256.0); and is adversarial trained by -FGSM (on MNIST: =0.3). All results are achieved with the batch size 100, running on the RTX Titan.

On MNIST

For sufficient evaluation, we firstly focus on applying FPD to ResNet-101 on MNIST. We mainly concentrate on two performance metrics: classification accuracy and attack time. Longer attack time can be a result of a monetary limit. In this perspective, we believe that longer attacking time may result in the excess of time and monetary limit. The attacker may surrender the attack. Therefore, we state that attackers spend more time attacking networks, which may protect the networks from another perspective.

We employ various white-box attacks to attack , , , , and . We consider following attacks, including -PGD, -FGSM, -PGD and -FGSM. We set =1.5 and 0.3 to bound the permutations for and norm. Both -PGD and -PGD are set to attack for 100 iterations and each step length is 0.1.

max width=1 Network Name Clean Examples BlackBox ResNet-101 ResNet-50 ResNeXt-50 Substitute: Substitute: Substitute: Substitute: Substitute: Substitute: FGSM PGD C&W FGSM PGD FGSM PGD C&W FGSM PGD FGSM PGD C&W FGSM PGD Average Acc Acc Acc Acc Acc Acc Acc Acc Acc Acc Acc Acc Acc Acc Acc Acc Acc ResNet-101 89% 87% 87% 86% 80% 86% 86% 87% 86% 81% 88.01% 84% 84% 86% 92% 85.12% 85.68% 84% 82% 83% 84% 64% 78% 82% 83% 84% 76% 83% 80% 82% 84% 82% 82% 80.6% ResNet-50 85% 81% 83% 88% 78% 81% 80% 82% 88% 76% 80% 92% 92% 88% 92% 92% 84.87% 89% 87% 88% 89% 77% 87% 87% 88% 89% 62% 71% 86% 90% 88% 92% 92% 84.87% ResNeXt-50 96% 92% 93.48% 96% 86% 93.45% 92% 94.97% 96% 90% 92% 84% 86% 92% 92% 94% 91.59% 86% 80% 84% 86% 74% 82% 84% 84.81% 86% 80% 84% 66% 62% 86% 80% 82% 80.05% Average Acc 88.17% 84.83% 86.41% 88.17% 76.5% 84.58% 85.17% 86.63% 88.17% 77.5% 83% 82% 82.67% 87.33% 88.33% 87.85% 84.61%

Table 3: Metrics: Robustness evaluation results (Accuracy %) in thwarting the black-box attacks with ResNet-101, ResNet-50 and ResNeXt-50 on SVHN.

max width=1.05 Network Name WhiteBox FGSM PGD C&W C&W DeepFool Average Acc Acc Acc Acc Acc Acc ResNet-101 1% 0% 0% 0% 28% 5.8% 57% 36% 39% 1% 3% 27.2% 44% 44% 71% 62.22% 53.25% 54.89% 48% 47% 72.57% 77.7% 57% 60.45% ResNet-50 4% 0% 0% 0% 34% 7.6% 55% 26% 28% 0% 11% 24% 33% 30% 61% 52.03% 36.78% 42.56% 39% 35% 70% 73.3% 45.38% 52.54% ResNeXt-50 13% 0% 0% 0.4% 51.24% 12.93% 58% 36% 46% 4.5% 24.27% 33.75% 80% 80% 86% 86% 83.17% 83.03% 80% 78% 86% 84.15% 84% 82.43%

Table 4: Robustness evaluation results (Accuracy %) in thwarting the white-box attacks with ResNet-101, ResNet-50 and ResNeXt-50 on SVHN.

We have the following remarks on our results as shown in Table 2. Generally, and its adversarial trained outperform and in accuracy around 32% and 10%, respectively. seems slightly more robust than . However, as revealed by the average attack time, more computational time (around 89 min) is spent on attacking . In particular, the overall time spent on attacking , its adversarial trained networks , are longer than , and around 104 min, 94 min and 89 min. Above results have demonstrated that and its adversarial trained networks are harder to be attacked.

On SVHN

We mainly assess the ability of FPD to enhance various block-based CNNs on colored samples: ResNet-101, ResNet-50, ResNeXt-50. We employ a series of white-box and black-box attacks to attack , , and for each block-based CNNs. Initially, we evaluate FPD performance in thwarting black-box attacks. As shown in Table  3, and of each block-based CNNs are employed as substitutes. We adopt -FGSM and -PGD to attack them. Besides, we observe that is hard to defend against a -C&W attack, depicted in Table 4. Therefore, we additionally adopt -C&W to attack substitute , to further evaluate FPD. As for white-box attacks, we adopt following attacks: -FGSM, -PGD, -C&W, -DeepFool and -C&W. We set =8/256.0 for above-mentioned attacks and PGD is set to attack for 40 iterations with step length 2/256.0.

(a) Adversarial
images.
(b) Restoration
(Adv).
(c) Restoration
(Clean).
Figure 5: Adversarial images (a) vs. the output of image restoration module from adversarial images (b) and clean images (c). Images are reproduced from the data in Table 4 (enhanced ResNet-101 attacked by PGD).

We have the following remarks on our results as shown in Table 3 and Table 4. Firstly, in defending against white-box attacks, and the adversarial trained far outperform and in accuracy for all the block-based CNNs, especially in ResNet-101 and ResNeXt-50. We notice that the performance of is not exactly satisfactory under black-box attacks, yet the outcome is not very surprising. As shown in Table 4, -based networks achieve a high accuracy under white-box attacks. Therefore, when these attacks are applied to substitute, some attacks effectively fail, returning a large number of clean samples as adversarial examples. Given that has a lower accuracy than on clean samples for ResNet-101 and ResNeXt-50 (as depicted in Table 3), -based networks achieve this biased performance under black-box attacks.

We also show the output of image restoration module in Figure 5. Adversarial images are well “denoised” by comparing Figure 4(a) with 4(b). Figure 4(b) and 4(c) illustrate that the module output generated by adversarial and clean images are quite similar. It guarantees that restoration module could generate similar images from both adversarial and clean images for , leading to more robust performance in defending against attacks.

On CALTECH-101 & CALTECH-256

We further demonstrate the efficacy of FPD on ResNet-101 on high dimensional dataset CALTECH-101 and CALTECH-256, attacked by -PGD attack for 40 iterations. For this attack, we set to 8/256.0 and step length to 2/256.0. To be specific, on CALTECH-101, achieves 61.78% under PGD attack. It outperforms around 34.64%. On CALTECH-256, our ResNet-101 model achieve 49.79% accuracy against 0.00% of the original one .

In summary, above results have demonstrated that the FPD-enhanced CNN is much more robust than non-enhanced versions on MNIST and high dimensional dataset CALTECH-101 and CALTECH-256. On colored dataset SVHN, the performance under black-box attacks is not exactly satisfactory. However, considering the performance in thwarting white-box attacks, FPD-enhanced CNN performs far better than non-enhanced versions.

5 Conclusion

In this paper, we have presented a novel Feature Pyramid Decoder (FPD) to enhance the intrinsic robustness of the block-based CNN. Besides, we have devised a novel two-phase training strategy. Through the exploration experiments, we have investigated the best structure of our FPD. Moreover, we go through a series of comparison experiments to demonstrate the effectiveness of the FPD. Attacking these models by a variety of white-box and black-box attacks, we have shown that the proposed FPD can enhance the robustness of the CNNs. We are planning to design a more powerful decoder to improve desnoising capability. Also, we will exploit a hard threshold to filter relatively bad restored images, further improving classification accuracy. Finally, we will transplant FPD to non-block CNN.

6 Acknowledgement

This paper is supported by the Fundamental Research Fund of Shandong Academy of Sciences (NO. 2018:12-16), Major Scientific and Technological Innovation Projects of Shandong Province, China (No. 2019JZZY020128), as well as AcRF Tier 2 Grant MOE2016-T2-2-022 and AcRF Tier 1 Grant RG17/19, Singapore.

References