Why Clean Generalization and Robust Overfitting Both Happen in Adversarial Training

by   Binghui Li, et al.

Adversarial training is a standard method to train deep neural networks to be robust to adversarial perturbation. Similar to surprising clean generalization ability in the standard deep learning setting, neural networks trained by adversarial training also generalize well for unseen clean data. However, in constrast with clean generalization, while adversarial training method is able to achieve low robust training error, there still exists a significant robust generalization gap, which promotes us exploring what mechanism leads to both clean generalization and robust overfitting (CGRO) during learning process. In this paper, we provide a theoretical understanding of this CGRO phenomenon in adversarial training. First, we propose a theoretical framework of adversarial training, where we analyze feature learning process to explain how adversarial training leads network learner to CGRO regime. Specifically, we prove that, under our patch-structured dataset, the CNN model provably partially learns the true feature but exactly memorizes the spurious features from training-adversarial examples, which thus results in clean generalization and robust overfitting. For more general data assumption, we then show the efficiency of CGRO classifier from the perspective of representation complexity. On the empirical side, to verify our theoretical analysis in real-world vision dataset, we investigate the dynamics of loss landscape during training. Moreover, inspired by our experiments, we prove a robust generalization bound based on global flatness of loss landscape, which may be an independent interest.


page 1

page 2

page 3

page 4


Understanding and Combating Robust Overfitting via Input Loss Landscape Analysis and Regularization

Adversarial training is widely used to improve the robustness of deep ne...

Boundary Adversarial Examples Against Adversarial Overfitting

Standard adversarial training approaches suffer from robust overfitting ...

Certified Robust Neural Networks: Generalization and Corruption Resistance

Adversarial training aims to reduce the problematic susceptibility of mo...

Stability Analysis and Generalization Bounds of Adversarial Training

In adversarial machine learning, deep neural networks can fit the advers...

Why Robust Generalization in Deep Learning is Difficult: Perspective of Expressive Power

It is well-known that modern neural networks are vulnerable to adversari...

Benign Overfitting in Adversarially Robust Linear Classification

"Benign overfitting", where classifiers memorize noisy training data yet...

How to Inject Backdoors with Better Consistency: Logit Anchoring on Clean Data

Since training a large-scale backdoored model from scratch requires a la...

Please sign up or login with your details

Forgot password? Click here to reset