Fast and Stable Adversarial Training through Noise Injection
Adversarial training is the most successful empirical method, to increase the robustness of neural networks against adversarial attacks yet. Unfortunately, this higher robustness is accompanied by considerably higher computational complexity. To date, only adversarial training with expensive multi-step adversarial attacks like Projected Gradient Descent (PGD) proved effective against equally strong attacks. In this paper, we present two ideas that combined enable adversarial training with the computationally less expensive Fast Gradient Sign Method (FGSM). First, we add uniform noise to the initial data point of the FGSM attack, which creates a wider variety of stronger adversaries. Further, we add a learnable regularization step prior to the neural network called Stochastic Augmentation Layer (SAL). Inputs propagated trough the SAL are resampled from a Gaussian distribution. The randomness of the resampling at inference time makes it more complicated for the attacker to construct an adversarial example since the outcome of the model is not known in advance. We show that noise injection in conjunction with FGSM adversarial training achieves comparable results to adversarial training with PGD while being orders of magnitude faster. Moreover, we show superior results in comparison to PGD-based training when combining noise injection and SAL.
READ FULL TEXT