Feature Purification: How Adversarial Training Performs Robust Deep Learning

05/20/2020
by   Zeyuan Allen-Zhu, et al.
0

Despite the great empirical success of adversarial training to defend deep learning models against adversarial perturbations, so far, it still remains rather unclear what the principles are behind the existence of adversarial perturbations, and what adversarial training does to the neural network to remove them. In this paper, we present a principle that we call "feature purification", where we show the existence of adversarial examples are due to the accumulation of certain "dense mixtures" in the hidden weights during the training process of a neural network; and more importantly, one of the goals of adversarial training is to remove such mixtures to "purify" hidden weights. We present both experiments on the CIFAR-10 dataset to illustrate this principle, and a Theoretical Result proving that for certain natural classification tasks, training a two-layer neural network with ReLU activation using randomly initialized gradient descent indeed satisfies this principle. Technically, we give, to the best of our knowledge, the first result proving that the following two can hold simultaneously for training a neural network with ReLU activation. (1) Training over the original data is indeed non-robust to small adversarial perturbations of some radius. (2) Adversarial training, even with an empirical perturbation algorithm such as FGM, can in fact be provably robust against ANY perturbations of the same radius. Finally, we also prove a complexity lower bound, showing that low complexity models such as linear classifiers, low-degree polynomials, or even the neural tangent kernel for this network, CANNOT defend against perturbations of this same radius, no matter what algorithms are used to train them.

READ FULL TEXT

page 3

page 9

page 13

page 14

page 16

page 20

page 22

page 24

research
03/16/2022

On the Convergence of Certified Robust Training with Interval Bound Propagation

Interval Bound Propagation (IBP) is so far the base of state-of-the-art ...
research
10/28/2020

Most ReLU Networks Suffer from ℓ^2 Adversarial Perturbations

We consider ReLU networks with random weights, in which the dimension de...
research
02/16/2020

Over-parameterized Adversarial Training: An Analysis Overcoming the Curse of Dimensionality

Adversarial training is a popular method to give neural nets robustness ...
research
06/14/2020

Proximal Mapping for Deep Regularization

Underpinning the success of deep learning is effective regularizations t...
research
05/25/2021

Practical Convex Formulation of Robust One-hidden-layer Neural Network Training

Recent work has shown that the training of a one-hidden-layer, scalar-ou...
research
02/09/2022

Gradient Methods Provably Converge to Non-Robust Networks

Despite a great deal of research, it is still unclear why neural network...
research
04/19/2021

Provable Robustness of Adversarial Training for Learning Halfspaces with Noise

We analyze the properties of adversarial training for learning adversari...

Please sign up or login with your details

Forgot password? Click here to reset