## 1 Introduction

Deep Neural Networks (DNNs) have shown a tremendous success in a variety of applications, including image classification and generation, text recognition, machine translation, playing games, etc.

Despite achieving a notable performance on the aforementioned tasks DNNs can be sensitive to the small perturbations of the inputs. szegedy2013intriguing have shown that
it is possibly to abuse this sensitivity to create *adversarial examples*

-- visually indistinguishable inputs which are classified differently. Following works proposed different adversarial attacks --- techniques of creating adversarial examples.

One of the first practical attacks is FGSM [goodfellow2014explaining], which used the appropriately scaled sign of the gradient. PGD [madry2018towards], one of the strogenst attacks up to date, improved FGSM by repeating the gradient step iteratively, i.e., performing projected gradient ascent in the neighbourhood of the input. C&W [carlini2017towards] used a loss term penalizing high distance from the orginal input instead of applying hard restriction on it. In this way, the resulting attack is *unbounded*, i.e., tries to find minimal-norm adversarial example rather than searching for it in predefined region. DDN [rony2019decoupling] significantly improved the runtime and performance of C&W by decoupling optimization of direction and norm.

It was noted that it is possible to create adversarial examples even without access to internal of the model and, in particular, gradients, i.e., treating model as a black box (as opposed to previously mentioned white box attacks). The approaches to black box attacks can be roughly divided into two main approaches. First is to use a different model with known gradients to generate adversarial examples and then transfer them to the victim model [papernot2017practical, liu2016delving]

. Another approach is to try to estimate gradients of the model numerically, based solely on it inputs and outputs

[chen2017zoo, li2019nattack, wierstra2008natural, anonymous2020bayesopt].In order to confront with adversarial attacks, it was suggested to add the adversarial examples to the training process and balance between them and the original images [szegedy2013intriguing, madry2018towards]. Many subsequent works have tried to increase the strength of training-time attacks to improve robustness [khoury2019adversarial, liu2019training, jiang2018learning, balaji2019instance, zantedeschi2017efficient]. A different approach to overcome adversarial attacks is to add randomization to the neural network [zheng2016improving, zhang2019defending], making it harder for the attacker to evaluate gradients and to find the vulnerability of the network. Recently, rakin2018parametric proposed to add Gaussian noise to the weights and activation of the network and showed improvement over vanilla adversarial training under various attacks.

#### Contribution.

In this paper, we propose a generalization of parametric noise injection (PNI) [rakin2018parametric] to which we term parametric colored noise injection (CNI). The main idea is to replace independent noise with low-rank multivariate Gaussian noise. We show that this change provides stable accuracy improvement under various attacks for a number of datasets.

## 2 Method

We note that CNI [rakin2018parametric] is similar to variational dropout [kingma2015variational]. kingma2015variational have studied an addition of both correlated and uncorrelated random noise to the weights, claiming dropout [srivastava2014dropout] is a particular case of such additive noise. kingma2015variational have shown that variational dropout is a powerful regularization technique. In particular, the advatage of correlated noise over uncorrelated one was demonstrated. However, rakin2018parametric

have only considered addition of uncorrelated noise. In addition, rather than focus on weights, CNI tested addition of noise both to weigths and activations. We tried to improve the quality of defence by introduction of correlation between different components of noise vector injected into weights or activations.

## 3 Experiments

Method | Accuracy, meanstd% | |
---|---|---|

Clean | PGD | |

PNI-W | ||

CNI-W | ||

CNI-A-a | ||

CNI-W+A-a | ||

PNI-W | ||

PNI-W+A-a |

Results of CNI. Mean and standard deviation is calculated over 10 runs for our experiments (upper half), and over 5 runs for experiments by

rakin2018parametric (lower half). Noise is injected either to weights (‘‘W’’) or output activations (‘‘A-a’’). Best results for PNI and CNI are set in bold.#### Experimental settings.

On CIFAR-10, we trained ResNet-20 defended with CNI for 400 epochs. We used SGD with the learning rate

, reduced by at epochs 200 and 300, weight decayFor colored noise we used multivariate normal distribution with low-rank covariance. For

-dimentional vector, the noise with covariance rank is distributed as(1) |

where

(2) |

for diagonal matrix and matrix . Note that PNI is a particular case of CNI with . In our experiments we chose . The value of

is an important hyperparameter, even though the method itself probably would work with different kinds of the noise distribution.

#### Ablation study.

We study the dependence of the network performance on noise rank. The results are shown in Fig. 3.1.

## 4 Conclusions

In this work we proposed to add colored gaussian noise to the weights and activation in order to improve the robustness of the NN under adversarial attacks.

This colored noise, outperform the results of the previous suggested method [rakin2018parametric], which only use independent Gaussian noise, under different adversarial attacks on various datasets.

### Acknowledgments

The research was funded by Hyundai Motor Company through HYUNDAI-TECHNION-KAIST Consortium, ERC StG RAPID, and Hiroshi Fujiwara Technion Cyber Security Research Center.

Comments

There are no comments yet.