ReLU defense against adversarial attacks
I introduce a very simple method to defend against adversarial examples. The basic idea is to raise the slope of the ReLU function at the test time. Experiments over MNIST and CIFAR-10 datasets demonstrate the effectiveness of the proposed defense against a number of strong attacks in both untargeted and targeted settings. While perhaps not as effective as the state of the art adversarial defenses, this approach can provide insights to understand and mitigate adversarial attacks. It can also be used in conjunction with other defenses.READ FULL TEXT VIEW PDF
MagNet and "Efficient Defenses..." were recently proposed as a defense t...
The safety and robustness of learning-based decision-making systems are ...
We find that images contain intrinsic structure that enables the reversa...
Recently, researchers have discovered that the state-of-the-art object
Deep neural networks have demonstrated cutting edge performance on vario...
Targeted clean-label poisoning is a type of adversarial attack on machin...
With the rise in popularity of machine and deep learning models, there i...
ReLU defense against adversarial attacks
The lockdown has not been too bad! After all, we found some time to explore what we always wanted to but did not have time for it. For me, I have been passing the time by immersing myself in adversarial ML. I was playing with the adversarial examples tutorial in PyTorch111https://pytorch.org/tutorials/beginner/fgsm_tutorial.html
and came across something interesting. So, I decided to share it with you. It is a simple defense that works well against untargeted attacks, and to some extent against targeted ones. Here is how it goes. The idea is to train a CNN with the ReLU activation function but increase its slope at the test time (Fig.1). Lets call this function Sloped ReLU or SReLU for short: , where is the slope. SReLU becomes ReLU for . To investigate this idea, I ran the following CNNs (Fig. 8) over MNIST LeCun et al. (1998):
Conv Pool SReLU Conv Pool SReLU FC SReLU FC
and over CIFAR-10 Krizhevsky (2009) (referred to as CIFAR10-CNN1):
Conv SReLU Pool Conv SReLU Pool FC SReLU FC SReLU FC
I also tried a variation of the latter network with SReLUs only after the first two FC layers (referred to as CIFAR10-CNN2). I chose .
Here, I emphasize on the FGSM attack since it is very straightforward Goodfellow et al. (2015). To craft an adversarial example , FGSM adds a tiny portion of the gradient of the loss w.r.t the input image back to the input image (i.e. gradient ascent in loss):
The perturbed image needs to be clipped to the right range (here [0,1] in all experiments over both datasets). balances the attack success rate versus imperceptibility. corresponds to the model performance on the original test set (i.e. no perturbation). To gain a higher attack rate more perturbation (i.e. larger ) is needed which leads to a more noticeable change (and vice versa).
The above formulation is for the untargeted attack. For the targeted attack, instead of increasing the loss for the true class label, we can lower the loss for the desired target class label (or we could do both):
In addition to FGSM, I also considered a number of other strong attacks including BIM (also known as IFGSM; iterative FGSM) Kurakin et al. (2016), RFGSM Tramèr et al. (2017), StepLL Kurakin et al. (2016), PGD Madry et al. (2017), and DeepFool Moosavi-Dezfooli et al. (2016). In almost all of these attacks (except DeepFool for which I varied the number of iterations), there is a parameter that controls the magnitude of perturbation (here represented by ). In what follows I will show the results over MNIST and CIFAR-10 datasets against both untargeted and targeted attacks.
corresponds to the classifier accuracy with no input perturbation.Higher y value here means better defense.
Results are shown in Fig. 2 over entire test sets of datasets. I did not find the SReLU slopes smaller than one very interesting but I am including them here for the sake of comparison. Their effect is not consistent. They are useful against targeted attacks but almost always hinder both classifier accuracy and robustness in untargeted attack settings.
Over MNIST, at and (top left panel in Fig. 2) classifier accuracy is around 98% and gradually falls as grows. For , the performance loss starts to recover. Finally, for the classifier has not been impacted at all, hence completely failing the attack. Classifier performance as a function of slope (averaged over epsilons) for each attack type is shown in the middle row of Fig. 2. The ReLU defense does very well against FGSM, BIM, RFGSM and PGD attacks. Over StepLL and DeepFool, however, it neither helped nor harmed the classifier. Results over the CIFAR-10 dataset are consistent with the MNIST results (middle and right columns in Fig. 2 corresponding to CIFAR10-CNN1 and CIFAR10-CNN2, respectively). On this dataset, the classifier had around 60% accuracy on the clean test set () and went down with more perturbation. Here again increasing led to better defense, although sometimes at the expense of accuracy. Please see Fig. 3 for the performance of each attack type as a function of perturbation magnitude.
Are these findings specific to SReLU function? To answer this question, I swapped the SReLU with other activation functions and measured the FGSM performance. Now the attacks became even more damaging (Fig. 2; bottom row) indicating that SReLU does better.
To assess this defense against a wider range of adversarial attacks, I conducted a larger scale analysis using the Foolbox code repository Rauber et al. (2017) over the first 2K images of each dataset. The proposed defense was tested against 25 attacks utilizing different L norms or image transformations (e.g. Gaussian blurring). The top row in Fig. 4 shows the mean classifier accuracy averaged over all attacks. Increasing improved the defense, although there seems to be a trade-off between accuracy and robustness. Please compare the last vs. in the top row of Fig. 4. Looking across the attacks (bottom row in Fig. 4 shown only for the last ; over MNIST and over CIFAR), however, reveals that the boost in improvement comes only from few attacks against which the defense has done a great job, in particular FGSM and its variants (e.g. PGD). Against the VirtualAdversarial attack, the proposed defense made the situation much worse. Overall, it seems that this defense works better against gradient based attacks as opposed to image degradation ones such as additive noise, salt and pepper noise, and inversion attack. Nonetheless, the ReLU defense was able to recover performance loss around 15% over MNIST and around 10-15% over CIFAR-10 (averaged over attacks; compare vs. in the bottom row of Fig. 4).
Performance of the ReLU defense against the targeted FGSM attack is reported in Fig. 5 over the full test sets of MNIST and CIFAR-10 datasets. The y axis here shows the success rate of the attack in fooling the CNN to classify an image as the desired target class (if it has not already been classified so). As expected, increasing (perturbation magnitude) leads to higher attack accuracy. Over both datasets, increasing the SReLU slope (above 1) reduces the attack success rate which means better defense. Here, ReLU defense seems to perform about the same for slopes greater than one.
Defense results over individual classes are shown in Fig. 6. Increasing the slope helped in 6 classes, out of 10, using both MNIST CNN and CIFAR10-CNN1. Using CIFAR10-CNN2, 4 classes were defended better. In the remaining cases, the defense tied with the slope of 1 (i.e. no defense) or slightly deteriorated the robustness. On average, it seems that ReLU defense is effective against the targeted FGSM attack, although not as good as its performance against the untargeted FGSM.
What is happening? I suspect the reason why this defense works is because it enhances the signal-to-noise ratio. In the untargeted attack setting, increasing the loss leads to suppressing pixels that are important in making the correct prediction while boosting irrelevant features. Since relevant features are associated with higher weights, increasing the SReLU slope will enhance those feature more compared to the irrelevant ones, hence recovering good features. For the same reason, this defense is not very effective against targeted attacks. In this setting, the attacker aims to lower the loss in favor of the target class. Raising the SReLU slope results in enhancing features that are important to the target class, as well as the true class. This ignites a competition between two sets of features which can go either way.
The above explanation is reminiscent of attention mechanisms in human vision (and also in computer vision), in particular neural gain modulation theories. These theories explain behavioral and neural data in a variety of visual tasks such as discrimination and visual searchScolari and Serences (2010); Borji and Itti (2014)
. On a related note, elevating the bias of neurons in different layers, as is done inBorji and Lin (2020), may lead to similar observations.
Is increasing the SReLU slope the same as scaling up the image by a factor? In general, No. Notice that , for and . For a linear network with positive weights, the answer is yes, but for non-linear CNNs comprised of positive and negative weights the answer is no. Just to make sure, I ran an experiment in which I multiplied the pixel values by and computed the classifier performance under the untargeted FGSM attack. Results are shown in Fig. 7. Clipping the pixel values to the [0,1] range (after scaling) did not improve the robustness. Interestingly, without clipping the pixel values, results resemble those obtained by increasing the SReLU slope (top right panel in Fig. 7; compare with Fig. 2.). This suggests that maybe instead of increasing the SReLU slope we can just scale the pixel values! Results over the CIFAR-10 dataset, however, do not support this hypothesis. On this dataset, scaling does not help robustness with or without clipping. The discrepancy of results over two datasets can be attributed to the fact that MNIST digits are gray level whereas CIFAR-10 images are RGB. Increasing the pixel intensity of MNIST digits leads to maintaining high classifier accuracy while at the same time making the job of the attacker harder since now he has to increase the magnitude of the perturbation. In conclusion, this analysis suggests that increasing pixel values is not as effective as increasing the SReLU slope. This, however, needs further investigation. If the opposite is true, then it will have the following unfortunate consequence. To counter the ReLU defense, the attacker can simply submit the scaled down version of the input image.
Last but not least, that fact that this simple defense consistently works against some strong attacks (e.g. PGD) is surprising. But please take these findings with a healthy dose of scepticism as I am still investigating them. Future work should evaluate the proposed approach on other models (e.g. ResNet), datasets (e.g. ImageNet), and against black-box attacks.
Acknowledgement. I would like to thank Google for making the Colaboratory platform available.
Towards deep learning models resistant to adversarial attacks. CoRR abs/1706.06083. External Links: Cited by: §2.
Foolbox v0.8.0: A python toolbox to benchmark the robustness of machine learning models. CoRR abs/1707.04131. Cited by: §2.1.