Robust Sparse Regularization: Simultaneously Optimizing Neural Network Robustness and Compactness

05/30/2019 ∙ by Adnan Siraj Rakin, et al. ∙ University of Central Florida 0

Deep Neural Network (DNN) trained by the gradient descent method is known to be vulnerable to maliciously perturbed adversarial input, aka. adversarial attack. As one of the countermeasures against adversarial attack, increasing the model capacity for DNN robustness enhancement was discussed and reported as an effective approach by many recent works. In this work, we show that shrinking the model size through proper weight pruning can even be helpful to improve the DNN robustness under adversarial attack. For obtaining a simultaneously robust and compact DNN model, we propose a multi-objective training method called Robust Sparse Regularization (RSR), through the fusion of various regularization techniques, including channel-wise noise injection, lasso weight penalty, and adversarial training. We conduct extensive experiments across popular ResNet-20, ResNet-18 and VGG-16 DNN architectures to demonstrate the effectiveness of RSR against popular white-box (i.e., PGD and FGSM) and black-box attacks. Thanks to RSR, 85 can be pruned while still achieving 0.68 perturbed-data accuracy respectively on CIFAR-10 dataset, in comparison to its PGD adversarial training baseline.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Deep Neural Networks (DNNs) have led to tremendous success in various applications, such as image classification Hinton et al. [2012b], speech recognition Hinton et al. [2012a], medical applications Hung et al. [2017] and etc. Wide deployment of DNNs has raised several major security concerns Goodfellow et al. [2014], Akhtar and Mian [2018], Chen et al. [2017a]. For example, in the context of image classification, an adversarial example is a carefully modified image that is visually imperceptible to human eyes, but fools the DNN successfully Goodfellow et al. [2014]. Recently, there have been a cohort of works toward developing new adversarial attack techniques, which have exposed the underlying vulnerability of DNN Athalye et al. [2018], Madry et al. [2018]. In order to counter adversarial attacks, several works have proposed different techniques, such as training the network with adversarial samples Madry et al. [2018], Goodfellow et al. [2014], regularization He et al. [2019], Lin et al. [2019] and other various methods Raghunathan et al. [2018], Samangouei et al. [2018].

In a separate yet related track, investigation towards generating efficient and compact networks have also been accelerated. Many prior works have been conducted regarding compression techniques including quantization Zhou et al. [2016], Courbariaux et al. [2015, 2016], He and Fan [2019] and weight pruning Han et al. [2015b], Molchanov et al. [2017], Wen et al. [2016]. It is shown that many DNNs can function properly (no accuracy loss) even after significantly (>90%) network pruning Han et al. [2015a], Molchanov et al. [2017]. Such sparse DNN achieves significant speed-up and compression rate which opens the door for DNN in memory and resource constraint applications Han et al. [2016]. Previously, several works have tried to generate both sparse and robust networks Guo et al. [2018], Ye et al. [2019] by combining the network pruning (i.e. compactness) technique to defend adversarial examples. However, their efforts either suffer from poor test data accuracy or do not improve the robustness significantly.

Figure 1: Weight distribution of ResNet-18 for the second convolution layer. We show three cases sequentially 1) clean data training 2) adversarial training(PGD) and 3) RSR training. Observation 1: Adversarial trained network has less sparsity than clean training thus making network pruning difficult. Observation 2: RSR training can achieve the desired sparseness with adversarial training.

Overview of RSR.

In this work, we propose a multi-objective optimization mechanism that could lead these two different yet related tracks, namely pursuit of network robustness and compression, to merge. To achieve this objective, we propose a novel Robust Sparse Regularization (RSR) method which integrates several regularization techniques to achieve such dual optimization. First, we propose to train a DNN with channel-wise noise injection (CNI) embedded with adversarial training to improve network robustness. Such technique injects a channel-wise Gaussian noise which is trainable during adversarial training. CNI improves test accuracy for both clean and perturbed data. Second, in order to simultaneously achieve network compactness and robustness, we propose a new ensemble loss function including an

-1 weight penalty term (i.e. lasso). Lasso regularization during adversarial training performs weight selection by constraining some weight values to a very small values as shown in figure 1. When training is done, we could prune the small weight values based on a threshold to achieve a sparse network. Our extensive experiments show that RSR training is an effective network pruning scheme to achieve improved robustness without sacrificing any clean data accuracy across different architectures.

2 Related Works

2.1 Adversarial Attack

Recent development of various powerful adversarial attack methods could totally fool a trained DNN through maliciously perturbed input data. Adversarial attacks can be generally classified into two major categories. First, white-box attack assumes the adversary has full access to the trained model and its parameters. Second, black-box attack assumes the adversary treats the model as a black box who could only access the inputs and output of the model. Now we briefly introduce various white-box and black-box attack methods which will be studied in this work.

FGSM Attack.

As one of the efficient attack method, fast Gradient Sign Method (FGSM) Szegedy et al. [2013]

uses a single-step to generate adversarial example. If we consider a vector input

and target label , FGSM alters each element of in the direction of its gradient w.r.t the loss . Similar as the original paper Goodfellow et al. [2014], we define the generation of adversarial examples as:

(1)

Here, the perturbation constraint is . By varying the value of we can vary the attack strength. calculates the output of DNN with parameters . is the sign function. To ensure valid pixel range for the images we clip the value of such that .

PGD Attack.

Projected Gradient Descent (PGD) Madry et al. [2018] is a powerful attack with multiple steps. It is an iterative version of FGSM, as the initialization. The perturbed data is updated during multi-step iteration process. At -th step it can be expressed as:

(2)

where is the projection space and is the step size. In Madry et al. [2018], it is also claimed that PGD is a universal adversary among all the first-order adversaries since it relies only on first-order information.

Black-box Attacks

In this work, we also evaluate our proposed method against a wide range of Black-Box attacks. Specifically we investigate the transferable adversarial attack Liu et al. [2016]. In transferable adversarial attack, the adversarial example is generated from source model to attack another target model. Both the source model and target model can be different but they are trained on the same training dataset. Moreover, Zero-th Order Optimization (ZOO) attack Chen et al. [2017b] is also investigated in this work. To perform ZOO attack it does not require training a substitute model, it can directly approximate the gradient just based on the input data and output scores from the model.

2.2 Adversarial Defenses

Several works Madry et al. [2018], Goodfellow et al. [2014] have proposed to jointly train the network with adversarial and clean samples, called adversarial training, to achieve network robustness. Later, development of backward pass differential attack (BPDA) Athalye et al. [2018] has exposed the underlying vulnerability of many other defense methods relying on gradient obfuscation Dhillon et al. [2018], Xie et al. [2018]. Then, training the network with adversarial examples has become one of the most popular defense approach to defend adversarial examples. Meanwhile, there is a cohort of work investigating the effect of regularization techniques such as quantization Rakin et al. [2018], Lin et al. [2019] , noise injection He et al. [2019], Lecuyer et al. [2018a], Liu et al. [2017], Yoshida and Miyato [2017], Lecuyer et al. [2018b] and pruning Dhillon et al. [2018], Ye et al. [2019, 2018], Guo et al. [2018] to improve the robustness. Several previous works have investigated the effects of network pruning on robustness Guo et al. [2018], Ye et al. [2019, 2018]. Recently, Ye et al. [2019] proposed concurrent weight pruning and adversarial training to generate robust and sparse network. However, their ADMM based pruning method’s performance on smaller network (i.e, lesser width) suffers from poor test accuracy for both clean and adversarial data. Further, Guo et al. [2018] showed that pruned network will defend adversarial examples provided that the network is not over-sparsified.

3 Approach

In this section, we first introduce the proposed Robust Sparse Regularization (RSR) technique, which is incorporated into a multi-objective optimization process that simultaneously improves network robustness and compactness. Our proposed RSR mainly consists of two components: 1) a trainable Channel-wise Noise Injection (CNI) and 2) lasso weight penalty (-1 norm) for model pruning, which will be introduced in this section.

3.1 Adversarial Training

Training the neural network with adversarial examples is a popular defense method Madry et al. [2018], Goodfellow et al. [2014]. Since our method integrates with such adversarial training, we briefly introduce it first. The goal of adversarial training can be formalized as: if we have a set of inputs- and target labels- , adversarial training tries to obtain the optimal solution of network parameters (i.e, weights, biases) for the following min-max optimization problem:

(3)

where the min-max optimization is composed of inner maximization and outer minimization problem. For inner maximization we acquire the perturbed data as shown in the description of PGD attack Madry et al. [2018]. While the outer minimization is optimized through gradient descent method during network training.

3.2 Channel-wise Noise Injection

The first regularization technique used in RSR is to inject learnable channel-wise noise on weights during the DNN adversarial training process. Considering a convolution layer in DNN with 4-D weight tensor

, where denotes number of output channel, input channel, kernel height and kernel width respectively, the Channel-wise Noise Injection (CNI) can be mathematically described as:

(4)

where is trainable noise scaling coefficient.

is the noise tensor where its elements are independently sampled from a Gaussian distributed source with zero mean and variance as

. Note that, is the variance of that is statistically calculated in the run-time. Preliminary work He et al. [2019] shows that parametric noise injection is an improved variant of adversarial training, where such trainable noise injection method could effectively regularize DNN during the adversarial training. We follow similar optimization and update rule for , but extending it into channel-wise version, where weights for each output channel shares the same noise scaling coefficient instead of whole layer.

We train the network with both clean and adversarial samples to achieve a good balance between adversarial data and clean test data accuracy. Optimization problem of equation 3 can be solved by minimizing the ensemble loss in equation 5. The ensemble loss is basically the weighted sum of losses for clean and adversarial data with channel-wise trainable noise injected on weights of DNN model:

(5)

where is the coefficient to balance the ensemble loss terms which is chosen as 0.5 by default. Optimizing the loss function improves network robustness. The optimizer tries to solve for both model parameters and to find an equilibrium between clean and perturbed data.

3.3 Lasso Weight Penalty

For incorporating the network pruning into the adversarial training, we propose to train the neural network with lasso weight penalty. Lasso is known as least absolute shrinkage and selection operator Tibshirani [1996]. It was introduced as a -1 regularizer that penalizes the features with higher values. Lasso is an ideal choice for weight pruning as it shrinks the lesser important weights to zero He et al. [2017], Wang et al. [2018], Wen et al. [2016]. We include the lasso weight penalty term into and reformat equation 5 as:

(6)

where denotes the weight tensor of -th layer, and is the total number of parametric layers (i.e., convolution and fully-connected layer). is the absolute sum of all the elements of a tensor. The effect of lasso weight penalty is determined by the coefficient , where in larger value would generate a sparse model containing a significant amount of weight with near zero values. We tune experimentally and describe the procedure for selecting optimized value of in section 4.2.

3.4 Weight Pruning

The proposed ensemble loss serves the purpose of multi-objective loss function. We expect a network after training with the ensemble loss to be more resilient to adversarial samples. At the same time, due to the presence of lasso weight penalty, we expect a significant portion of the weight tensor to converge to near zero values. We then perform weight pruning after training with the proposed ensemble loss function, by setting the weights below a certain threshold () to zero. Note that, after pruning, we remove the noise injection term for zero-value weights. As a result, during inference, we only add noise to the non-zero elements of the weight tensor. For the weight tensor in a fully connected layer, lets assume (). For convolution layer, (). Then, the pruning operation can be described as:

FC layer if (7)
Conv. layer if (8)

Here is the threshold, which is the least absolute non-zero value in the weight tensor after pruning. Again, we can tune the value of for different networks to achieve different sparsity ratio. Hence, by tuning the value of , we can effectively show the maximum amount of parameters that can be pruned without causing robustness degradation.

4 Experiments

4.1 Experiment setup

Datasets and network architectures.

In this work, we only consider CIFAR-10 dataset for image classification task as most of the baseline works report their robustness in terms of under-attack accuracy on this dataset. CIFAR-10 is composed of 50K training samples and 10K test samples. Our data augmentation method is same as described in He et al. [2016]. Attacker can directly add noise to the natural images as our data normalization layer is placed in front of the DNN as a non-trainable layer. We adopt three classical networks, ResNet-20, ResNet-18 He et al. [2016] and VGG-16 Simonyan and Zisserman [2014], to perform comparative analysis. We also show the analysis on the effect of network width by varying the width of ResNet-18 network. We report the mean accuracy with 5 trials due to the presence of randomness in both CNI and PGD Madry et al. [2018]. We tune the hyper parameter to be for both ResNet-18 and VGG-16 and for ResNet-20.

Adversarial attacks.

In order to attack CIFAR-10 dataset using PGD attack, is set to , and is set to 7. For FGSM, attack parameters (i.e, ) remain the same as PGD. We use the same attack hyper parameters as in Madry et al. [2018]. Moreover, we also conduct the RSR defense against several state-of-the-art black-box attacks (i.e. ZOO Chen et al. [2017b] and transfer Liu et al. [2016] attack) in order to evaluate the proposed RSR against a wide range of attacks.

Competing methods for adversarial defense.

In this work, PGD adversarial training Madry et al. [2018] is selected as the primary baseline method. Additionally, our work includes channel-wise noise injection, so we also compare the method with parametric noise injection (PNI) He et al. [2019]. Additionally, we also compare our work with several network compression and pruning methods Ye et al. [2019], Lin et al. [2019]. Finally, comparison with several state-of-the art regularization techniques serving as a adversarial defense Lecuyer et al. [2018a], Liu et al. [2017] is also presented.

4.2 Results

White-Box Attack.

Our simulation results on two popular white-box attack PGD Madry et al. [2018] and FGSM Goodfellow et al. [2014] are presented in table 1. During adversarial training, as stated in section 3.1, we use PGD algorithm to generate the adversarial samples. First, for the regular models, we do not perform any weight pruning. RSR helps to achieve significant robustness enhancement and even improves the clean data accuracy compared to baseline PGD training Madry et al. [2018]. We observe that, with increasing the model capacity, network robustness increases as well. The observation of robustness enhancement with increasing network capacity is consistent with previous works Madry et al. [2018], He et al. [2019]. For our proposed RSR, the pattern remains the same. Our best accuracy was obtained using VGG-16 network. We could improve the clean test data accuracy by 0.95% and perturbed data accuracy by 9.48% Under strong PGD attack for VGG-16.

ResNet-20 ResNet-18 VGG-16
Capacity 269,722 11,173,962 138,357,544
Scheme Clean PGD FGSM Sparsity Clean PGD FGSM Sparsity Clean PGD FGSM Sparsity
Before Pruning

PGD
83.58 39.44 46.87 0 86.11 44.31 53.52 0 82.88 37.57 46.94 0

CNI
84.67 46.11 54.40 0 86.82 47.85 56.04 0 83.13 44.23 51.56 0

Lasso
83.56 38.69 45.78 0 85.92 46.94 55.2 0 83.26 41.93 50.33 0

RSR
84.96 47.95 56.72 0 86.95 52.94 60.89 0 83.83 47.05 54.05 0
After Pruning
PGD 51.58 12.49 16.11 60.47 70.31 31.00 35.8 85.43 78.40 32.14 42.21 50.62
CNI 55.93 23.91 29.11 60.74 50.97 22.54 25.31 85.27 75.79 40.39 46.37 50.62

Lasso
83.64 38.46 45.44 60.14 85.92 46.8 55.2 85.38 83.24 42.01 50.32 50.15

RSR
84.32 47.44 55.74 60.85 86.79 53.03 60.35 85.36 83.02 47.70 54.16 50.93
Table 1: Summary of CIFAR-10 Results: We report clean and perturbed-data(under PGD and FGSM attack) accuracy (%) on CIFAR-10 test data. To visualize the effect of lasso and CNI, we also report the independent test accuracy for both channel-wise noise injection (CNI) and lasso loss. We report the percentage of weight being pruned (exactly equal to zero) as the sparsity(%). Capacity denotes the number of trainable parameters in the network.

Our proposed RSR can prune 60%, 85% and 50% of the network’s weight for ResNet-20, ResNet-18 and VGG-16, respectively, without any clean test accuracy loss. To show the comparative effect of network robustness and sparsity, we prune each of the four training cases (PGD/ CNI/ Lasso/ RSR) by equal amount. The level of sparsity can always be tuned by choosing different values of . As expected, both PGD and CNI performance suffers significantly after pruning. On the contrary, RSR outperforms baseline PGD (even without pruning) training method. We observe 8.72% and 6.83% improvement on test accuracy under PGD and FGSM attack respectively for ResNet-18 architecture. Again the most significant improvement observed in the VGG-16 network which has the largest capacity. Such observation confirms that increasing the number of parameters increases the effect of weight penalty and noise injection to enhance the network robustness. Another question to be asked is what if we want to prune the network beyond the reported sparsity. For example, if we want to prune ResNet-18 beyond 85% , does the network still remain robust? We try to answer this question in the next paragraph where we explain the effect of network width with sparsity.

Effect of Network Width.

Ye et al. [2019] demonstrated that decreasing a network width may have negative impact on robustness. To verify if our method also follows the same trend, we show an ablation study on ResNet-18 with decreasing network width in table 2. Our observation confirms that RSR method still remains more robust than the baseline PGD method for each case (i.e., ). On the other side, we achieve less sparsity on network with smaller network width. In case, we could only achieve 38.33% sparsity without sacrificing any clean or perturbed data accuracy. This observation is quite intuitive as ResNet-18() network already has less parameters than that of ResNet-18 (). Thus, even after performing less amount of weight pruning, the percentage of parameter (7.7%) in the network would still be smaller compared to ResNet-18() (14.37%). Finally, such observation also answers the question asked previously: A particular architecture (e.g., ResNet-18) can be pruned up to a certain amount of sparsity levels based on the network width. The maximum number of parameters which can be pruned without any sacrifice in robustness may vary across different architectures. Figure 1(a) shows ResNet-20, ResNet-18 and VGG-16 test accuracy under PGD attack starts to drop at different sparsity levels (% weight equal to zero). If any model is sparsified beyond this point, it falls under the definition of over-sparsified model Guo et al. [2018] and the network no longer remains robust.

Clean
Test (%)
Adversarial
Attack(PGD)
%
Sparsity:
(%)
(%) of parameter
remain in the network
compared to ResNet-18()
Channel
Width
Adv.
Trained
RSR
Adv.
Trained
RSR
Adv.
Trained
RSR RSR
82.68 83.18 39.01 45.38 0 38.33 (100-38.33)0.125=7.7
84.99 84.85 43.33 50.7 0 63.17 (100-63.17)0.25=9.21
86.82 86.79 47.85 53.03 0 85.36 (100-85.36)=14.37
Table 2: Ablation study with varying width. We report clean and perturbed-data(under PGD and FGSM attack) accuracy on CIFAR-10. ResNet-18() is chosen as the baseline. Network width and denotes that the width of the network’s both input channel and output channel is scaled by and respectively.

Robustness improvement coming from Lasso training? or CNI training? or Both?

We have provided comprehensive experimental analysis on our proposed RSR method to show its performance enhancement on three fronts: clean data accuracy, robustness (i.e. under attack accuracy) and sparsity. Table 1, confirms that lasso loss primarily contributes to the sparse model generation through weight shrinkage. However, in order to identify the chief contributor towards robustness improvement, an ablation study is shown in table 1, where we also report effect of training the network only with lasso loss and CNI, respectively. The regularization effect of lasso is less significant for ResNet-20 and CNI plays the dominant role in network robustness improvement. However, both lasso and Channel-wise noise injection contributes towards the improvement of robustness for redundant networks (i.e, VGG-16). Both lasso and CNI can improve the network robustness by close to 4 % and 7 %, respectively, on VGG-16. Nonetheless, we choose lasso because it helps shrink weight values to a very small value, thus performing a robust model selection during adversarial training. Apart from that, lasso regularization also supplements CNI towards defending adversarial examples .

Choice of Lambda ().

In figure 1(b), we show a plot of test accuracy on both clean and perturbed data versus Lambda() for ResNet-20. Clearly, both the test accuracy starts to drop if we increase beyond . So for ResNet-20 we choose as the standard value of to achieve the maximum sparsity without any degradation in test accuracy. Similarly, the value of for other architectures (i.e, ResNet-18, VGG-16) is optimized experimentally.

(a)
(b)
(c)
Figure 2: a) The relationship between test accuracy (%) under PGD attack Vs Percentage of weight pruned(exactly equal to zero). It shows each network can be pruned up to certain level of sparsity. Pruning beyond that level would make the model over-sparsified Guo et al. [2018] and the network no longer remains robust. b) The plot shows both clean and perturbed data (PGD) accuracy (%) for ResNet-20(RSR) VS Lambda(). is the regularization parameter for the lasso loss. c) X- axis contains different Gamma() values and Y-axis shows the percentage of weight below a certain threshold Gamma(). is least absolute value after pruning in a network. Clean, Adv and RSR denotes clean test data training method, adversarial training method and Our proposed RSR method respectively. This plot is only for convolution layer of ResNet-18 architecture.

Black-Box Attack.

We report the black-box attack accuracy for ResNet-20 architecture in table 3. We test our defense method against un-targeted ZOO attack Chen et al. [2017b]. We randomly select 200 test samples to calculate the attack success rate. Our proposed method defends ZOO attack better as it decreases the attack success rate by 12 % compared to baseline PGD method.

Method
ZOO Success
rate (%)
Source(VGG-16)
Accuracy(%)
Source(ResNet-18)
Accuracy(%)
PGD 68.50 66.13 67.44
RSR 56.00 66.04 67.27
Table 3: Black-Box attack summary. ZOO attack success rate (in 2nd column) is the percentage of test sample being successfully classified to a wrong class by the attack. We report two sets of transfer attack accuracy: one with VGG16 as the source (3rd column) and the other with ResNet-18 as the source (4th column). For both PGD and RSR ResNet-20 is the target model.

To perform transferable attack on RSR and PGD, we use VGG-16 and ResNet-18 network as the source model. For both cases, our RSR performs on par with the PGD method. Additionally, our proposed RSR reports higher test accuracy against black-box attack compared to white-box PGD method. Better resistance against black-box attack is considered as a sign of a defense that does not effectively uses obfuscated or masked gradient Athalye et al. [2018] as a defense tool.

Comparison to state-of-the art techniques.

In table 4, we summarize the performance of our defense in comparison to some other state-of-the-art defense techniques. Our proposed RSR method outperforms these comparative defenses and achieves significant robustness improvement.

Adversarial
Training
Compression Regularization This work
PGD PNI DQ SR DP RSE RSR
Model ResNet-18 ResNet-20(4) Wide ResNet ResNet-18 Wide ResNet ResNext ResNet-18
Clean(%) 86.11 87.7 87.0 81.83 87.0 87.5 86.79
PGD(%) 44.31 49.1 51.8 48.00 25.0 40.0 53.03
Sparsity (%) 0 0 (6) Compression - 0 0 85.36
Table 4: We compare our method with three major categories of defense: a) Adversarial training defenses: Projected Gradient Descent (PGD) training Madry et al. [2018], Parametric Noise Injection (PNI) He et al. [2019] b) Compression or pruning techniques: Defensive Quantization (DQ) Lin et al. [2019], Second Rethinking of Network Pruning (SR) Ye et al. [2019] and c) Regularization techniques: Differential Privacy (DP) Lecuyer et al. [2018a] and Robust Self Ensemble(RSE) Liu et al. [2017].

Note that, we compare with the unbroken defenses that are not reported to show signs of obfuscated gradients Athalye et al. [2018] yet. Again there are some previous works on network pruning and robustness Dhillon et al. [2018] which might suffer from gradient obfuscation Athalye et al. [2018]. Guo et al. [2018] first theoretically shows the effect of pruning on non-linear DNN to demonstrate the vulnerability of over-sparsified model to adversarial attacks. However, we are the first to formulate an improved adversarial defense with sparse regularization. Our proposed RSR generates sparse and compact neural network that can achieve state-of-the-art under-attack accuracy and much improved robustness.

5 Analysis

RSR is performing regularization.

Robust Sparse Regularization is performing regularization on the network to enhance both robustness and compactness. It does not show any obvious signs of gradient masking proposed in Athalye et al. [2018]. First, RSR performs better against single step attack (i.e, FGSM) compared to multiple step attack (i.e., PGD). Also we report higher test accuracy against black-box attack than white-box. Finally, increasing the attack strength linearly decreases the effectiveness of our defense. Such observations confirm primarily our robustness enhancement is not achieved through any gradient obfuscation or masking Athalye et al. [2018]. Instead, our improvement in robustness primarily comes from regularized training method. However, after pruning , presence of noise in the surviving weights is also playing a critical role. Table 5 summarizes the impact of the presence of noise during the inference on robustness.

With inference Noise Without Inference Noise Baseline(PGD)
Clean PGD Clean PGD Clean PGD
RSR 83.02 47.70 83.47 41.61 82.88 37.57
Table 5: We report the clean and perturbed data (PGD) test accuracy for two cases 1) with inference noise and 2) without inference noise. We choose VGG-16 with 50 % sparsity.

When we disable the inference noise in the network, test accuracy under PGD attack drops to 41.61 %. This confirms the presence of inference noise contributing heavily towards the robustness. However, our regularization training still stands out as even after the accuracy drop we maintain a higher accuracy than the baseline. Thus we conclude the robustness achieved in this work can be viewed as a combined effect of regularization(i.e., CNI and lasso), sparsity and inference noise.

Optimal Gamma provides the improvement on three fronts.

We can fine-tune the model after training to prune the weights of a network below a certain threshold (). During training apart from enhancing robustness, RSR mainly shrinks the weights of the network. The demonstration of weight shrinkage is presented in figure 1(c). ResNet-18 network training with RSR contains 85% weights with near zero value (less than ). So pruning weights with such small values will have minimal effect on clean test accuracy and robustness. Thus, the value of can be tuned to an optimal point for each network to achieve improvement on three fronts: clean data accuracy, robustness(i.e., under attack accuracy) and sparsity.

6 Conclusion

We successfully co-optimize the objective of network robustness and compactness through our proposed RSR training method. As a result, we show that heavily sparse network can resist adversarial examples to generate both robust and compact neural network at the same time. Our proposed method performs dual optimization during training to resist state-of-art white-box and black-box attacks using a more compact network.

References