Colored Noise Injection for Training Adversarially Robust Neural Networks

by   Evgenii Zheltonozhskii, et al.

Even though deep learning have shown unmatched performance on various tasks, neural networks has been shown to be vulnerable to small adversarial perturbation of the input which lead to significant performance degradation. In this work we extend the idea of adding independent Gaussian noise to weights and activation during adversarial training (PNI) to injection of colored noise for defense against common white-box and black-box attacks. We show that our approach outperforms PNI and various previous approaches in terms of adversarial accuracy on CIFAR-10 dataset. In addition, we provide an extensive ablation study of the proposed method justifying the chosen configurations.



page 1

page 2

page 3


Parametric Noise Injection: Trainable Randomness to Improve Deep Neural Network Robustness against Adversarial Attack

Recent development in the field of Deep Learning have exposed the underl...

Boundary Defense Against Black-box Adversarial Attacks

Black-box adversarial attacks generate adversarial samples via iterative...

Learn2Perturb: an End-to-end Feature Perturbation Learning to Improve Adversarial Robustness

While deep neural networks have been achieving state-of-the-art performa...

Noise Optimization for Artificial Neural Networks

Adding noises to artificial neural network(ANN) has been shown to be abl...

Robust Wireless Fingerprinting via Complex-Valued Neural Networks

A "wireless fingerprint" which exploits hardware imperfections unique to...

A Novel Approach to Train Diverse Types of Language Models for Health Mention Classification of Tweets

Health mention classification deals with the disease detection in a give...

Dynamic Divide-and-Conquer Adversarial Training for Robust Semantic Segmentation

Adversarial training is promising for improving robustness of deep neura...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Deep Neural Networks (DNNs) have shown a tremendous success in a variety of applications, including image classification and generation, text recognition, machine translation, playing games, etc.

Despite achieving a notable performance on the aforementioned tasks DNNs can be sensitive to the small perturbations of the inputs. szegedy2013intriguing have shown that it is possibly to abuse this sensitivity to create adversarial examples

-- visually indistinguishable inputs which are classified differently. Following works proposed different adversarial attacks --- techniques of creating adversarial examples.

One of the first practical attacks is FGSM [goodfellow2014explaining], which used the appropriately scaled sign of the gradient. PGD [madry2018towards], one of the strogenst attacks up to date, improved FGSM by repeating the gradient step iteratively, i.e., performing projected gradient ascent in the neighbourhood of the input. C&W [carlini2017towards] used a loss term penalizing high distance from the orginal input instead of applying hard restriction on it. In this way, the resulting attack is unbounded, i.e., tries to find minimal-norm adversarial example rather than searching for it in predefined region. DDN [rony2019decoupling] significantly improved the runtime and performance of C&W by decoupling optimization of direction and norm.

It was noted that it is possible to create adversarial examples even without access to internal of the model and, in particular, gradients, i.e., treating model as a black box (as opposed to previously mentioned white box attacks). The approaches to black box attacks can be roughly divided into two main approaches. First is to use a different model with known gradients to generate adversarial examples and then transfer them to the victim model [papernot2017practical, liu2016delving]

. Another approach is to try to estimate gradients of the model numerically, based solely on it inputs and outputs

[chen2017zoo, li2019nattack, wierstra2008natural, anonymous2020bayesopt].

In order to confront with adversarial attacks, it was suggested to add the adversarial examples to the training process and balance between them and the original images [szegedy2013intriguing, madry2018towards]. Many subsequent works have tried to increase the strength of training-time attacks to improve robustness [khoury2019adversarial, liu2019training, jiang2018learning, balaji2019instance, zantedeschi2017efficient]. A different approach to overcome adversarial attacks is to add randomization to the neural network [zheng2016improving, zhang2019defending], making it harder for the attacker to evaluate gradients and to find the vulnerability of the network. Recently, rakin2018parametric proposed to add Gaussian noise to the weights and activation of the network and showed improvement over vanilla adversarial training under various attacks.


In this paper, we propose a generalization of parametric noise injection (PNI) [rakin2018parametric] to which we term parametric colored noise injection (CNI). The main idea is to replace independent noise with low-rank multivariate Gaussian noise. We show that this change provides stable accuracy improvement under various attacks for a number of datasets.

2 Method

We note that CNI [rakin2018parametric] is similar to variational dropout [kingma2015variational]. kingma2015variational have studied an addition of both correlated and uncorrelated random noise to the weights, claiming dropout [srivastava2014dropout] is a particular case of such additive noise. kingma2015variational have shown that variational dropout is a powerful regularization technique. In particular, the advatage of correlated noise over uncorrelated one was demonstrated. However, rakin2018parametric

have only considered addition of uncorrelated noise. In addition, rather than focus on weights, CNI tested addition of noise both to weigths and activations. We tried to improve the quality of defence by introduction of correlation between different components of noise vector injected into weights or activations.

3 Experiments

Method Accuracy, meanstd%
Clean PGD
Table 3.1:

Results of CNI. Mean and standard deviation is calculated over 10 runs for our experiments (upper half), and over 5 runs for experiments by

rakin2018parametric (lower half). Noise is injected either to weights (‘‘W’’) or output activations (‘‘A-a’’). Best results for PNI and CNI are set in bold.

Experimental settings.

On CIFAR-10, we trained ResNet-20 defended with CNI for 400 epochs. We used SGD with the learning rate

, reduced by at epochs 200 and 300, weight decay

For colored noise we used multivariate normal distribution with low-rank covariance. For

-dimentional vector, the noise with covariance rank is distributed as




for diagonal matrix and matrix . Note that PNI is a particular case of CNI with . In our experiments we chose . The value of

is an important hyperparameter, even though the method itself probably would work with different kinds of the noise distribution.

Ablation study.

We study the dependence of the network performance on noise rank. The results are shown in Fig. 3.1.

Figure 3.1: Accuracy of CNI-W model under PGD attack with noise covariance rank.

4 Conclusions

In this work we proposed to add colored gaussian noise to the weights and activation in order to improve the robustness of the NN under adversarial attacks.

This colored noise, outperform the results of the previous suggested method [rakin2018parametric], which only use independent Gaussian noise, under different adversarial attacks on various datasets.


The research was funded by Hyundai Motor Company through HYUNDAI-TECHNION-KAIST Consortium, ERC StG RAPID, and Hiroshi Fujiwara Technion Cyber Security Research Center.