Vanilla Feature Distillation for Improving the Accuracy-Robustness Trade-Off in Adversarial Training

by   Guodong Cao, et al.

Adversarial training has been widely explored for mitigating attacks against deep models. However, most existing works are still trapped in the dilemma between higher accuracy and stronger robustness since they tend to fit a model towards robust features (not easily tampered with by adversaries) while ignoring those non-robust but highly predictive features. To achieve a better robustness-accuracy trade-off, we propose the Vanilla Feature Distillation Adversarial Training (VFD-Adv), which conducts knowledge distillation from a pre-trained model (optimized towards high accuracy) to guide adversarial training towards higher accuracy, i.e., preserving those non-robust but predictive features. More specifically, both adversarial examples and their clean counterparts are forced to be aligned in the feature space by distilling predictive representations from the pre-trained/clean model, while previous works barely utilize predictive features from clean models. Therefore, the adversarial training model is updated towards maximally preserving the accuracy as gaining robustness. A key advantage of our method is that it can be universally adapted to and boost existing works. Exhaustive experiments on various datasets, classification models, and adversarial training algorithms demonstrate the effectiveness of our proposed method.


page 1

page 2

page 3

page 4


Mitigating the Accuracy-Robustness Trade-off via Multi-Teacher Adversarial Distillation

Adversarial training is a practical approach for improving the robustnes...

Mutual Adversarial Training: Learning together is better than going alone

Recent studies have shown that robustness to adversarial attacks can be ...

Adversarial Concurrent Training: Optimizing Robustness and Accuracy Trade-off of Deep Neural Networks

Adversarial training has been proven to be an effective technique for im...

Decoder-free Robustness Disentanglement without (Additional) Supervision

Adversarial Training (AT) is proposed to alleviate the adversarial vulne...

Adversarial Training Over Long-Tailed Distribution

In this paper, we study adversarial training on datasets that obey the l...

Annealing Self-Distillation Rectification Improves Adversarial Training

In standard adversarial training, models are optimized to fit one-hot la...

Releasing Inequality Phenomena in L_∞-Adversarial Training via Input Gradient Distillation

Since adversarial examples appeared and showed the catastrophic degradat...

Please sign up or login with your details

Forgot password? Click here to reset