Towards Alternative Techniques for Improving Adversarial Robustness: Analysis of Adversarial Training at a Spectrum of Perturbations

by   Kaustubh Sridhar, et al.

Adversarial training (AT) and its variants have spearheaded progress in improving neural network robustness to adversarial perturbations and common corruptions in the last few years. Algorithm design of AT and its variants are focused on training models at a specified perturbation strength ϵ and only using the feedback from the performance of that ϵ-robust model to improve the algorithm. In this work, we focus on models, trained on a spectrum of ϵ values. We analyze three perspectives: model performance, intermediate feature precision and convolution filter sensitivity. In each, we identify alternative improvements to AT that otherwise wouldn't have been apparent at a single ϵ. Specifically, we find that for a PGD attack at some strength δ, there is an AT model at some slightly larger strength ϵ, but no greater, that generalizes best to it. Hence, we propose overdesigning for robustness where we suggest training models at an ϵ just above δ. Second, we observe (across various ϵ values) that robustness is highly sensitive to the precision of intermediate features and particularly those after the first and second layer. Thus, we propose adding a simple quantization to defenses that improves accuracy on seen and unseen adaptive attacks. Third, we analyze convolution filters of each layer of models at increasing ϵ and notice that those of the first and second layer may be solely responsible for amplifying input perturbations. We present our findings and demonstrate our techniques through experiments with ResNet and WideResNet models on the CIFAR-10 and CIFAR-10-C datasets.


page 4

page 10

page 12

page 13

page 14

page 15


Regularizing deep networks using efficient layerwise adversarial training

Adversarial training has been shown to regularize deep neural networks i...

Invariance vs. Robustness of Neural Networks

We study the performance of neural network models on random geometric tr...

Strength-Adaptive Adversarial Training

Adversarial training (AT) is proved to reliably improve network's robust...

Adversarial Robustness Through the Lens of Convolutional Filters

Deep learning models are intrinsically sensitive to distribution shifts ...

Adv-4-Adv: Thwarting Changing Adversarial Perturbations via Adversarial Domain Adaptation

Whereas adversarial training can be useful against specific adversarial ...

Robustifying ℓ_∞ Adversarial Training to the Union of Perturbation Models

Classical adversarial training (AT) frameworks are designed to achieve h...

Constant Random Perturbations Provide Adversarial Robustness with Minimal Effect on Accuracy

This paper proposes an attack-independent (non-adversarial training) tec...

Please sign up or login with your details

Forgot password? Click here to reset