Intriguing generalization and simplicity of adversarially trained neural networks

06/16/2020
by   Chirag Agarwal, et al.
13

Adversarial training has been the topic of dozens of studies and a leading method for defending against adversarial attacks. Yet, it remains unknown (a) how adversarially-trained classifiers (a.k.a "robust" classifiers) generalize to new types of out-of-distribution examples; and (b) what hidden representations were learned by robust networks. In this paper, we perform a thorough, systematic study to answer these two questions on AlexNet, GoogLeNet, and ResNet-50 trained on ImageNet. While robust models often perform on-par or worse than standard models on unseen distorted, texture-preserving images (e.g. blurred), they are consistently more accurate on texture-less images (i.e. silhouettes and stylized). That is, robust models rely heavily on shapes, in stark contrast to the strong texture bias in standard ImageNet classifiers (Geirhos et al. 2018). Remarkably, adversarial training causes three significant shifts in the functions of hidden neurons. That is, each convolutional neuron often changes to (1) detect pixel-wise smoother patterns; (2) detect more lower-level features i.e. textures and colors (instead of objects); and (3) be simpler in terms of complexity i.e. detecting more limited sets of concepts.

READ FULL TEXT

page 14

page 15

page 28

page 30

page 31

page 32

page 33

page 34

research
09/23/2019

Robust Local Features for Improving the Generalization of Adversarial Training

Adversarial training has been demonstrated as one of the most effective ...
research
09/18/2020

Prepare for the Worst: Generalizing across Domain Shifts with Adversarial Batch Normalization

Adversarial training is the industry standard for producing models that ...
research
03/03/2023

Revisiting Adversarial Training for ImageNet: Architectures, Training and Generalization across Threat Models

While adversarial training has been extensively studied for ResNet archi...
research
03/03/2022

Why adversarial training can hurt robust accuracy

Machine learning classifiers with high test accuracy often perform poorl...
research
05/23/2019

Interpreting Adversarially Trained Convolutional Neural Networks

We attempt to interpret how adversarially trained convolutional neural n...
research
07/22/2020

Adversarial Training Reduces Information and Improves Transferability

Recent results show that features of adversarially trained networks for ...
research
05/29/2019

Learning Robust Global Representations by Penalizing Local Predictive Power

Despite their renowned predictive power on i.i.d. data, convolutional ne...

Please sign up or login with your details

Forgot password? Click here to reset