Understanding Catastrophic Overfitting in Adversarial Training

05/06/2021
by   Peilin Kang, et al.
0

Recently, FGSM adversarial training is found to be able to train a robust model which is comparable to the one trained by PGD but an order of magnitude faster. However, there is a failure mode called catastrophic overfitting (CO) that the classifier loses its robustness suddenly during the training and hardly recovers by itself. In this paper, we find CO is not only limited to FGSM, but also happens in ^∞-1 adversarial training. Then, we analyze the geometric properties for both FGSM and ^∞-1 and find they have totally different decision boundaries after CO. For FGSM, a new decision boundary is generated along the direction of perturbation and makes the small perturbation more effective than the large one. While for ^∞-1, there is no new decision boundary generated along the direction of perturbation, instead the perturbation generated by ^∞-1 becomes smaller after CO and thus loses its effectiveness. We also experimentally analyze three hypotheses on potential factors causing CO. And then based on the empirical analysis, we modify the RS-FGSM by not projecting perturbation back to the l_∞ ball. By this small modification, we could achieve 47.56 ± 0.37% PGD-50-10 accuracy on CIFAR10 with ϵ=8/255 in contrast to 43.57 ± 0.30% by RS-FGSM and also further extend the working range of ϵ from 8/255 to 11/255 on CIFAR10 without CO occurring.

READ FULL TEXT

page 13

page 16

page 19

page 20

research
02/02/2022

Make Some Noise: Reliable and Efficient Single-Step Adversarial Training

Recently, Wong et al. showed that adversarial training with single-step ...
research
05/30/2022

Robust Weight Perturbation for Adversarial Training

Overfitting widely exists in adversarial robust training of deep network...
research
10/15/2020

Overfitting or Underfitting? Understand Robustness Drop in Adversarial Training

Our goal is to understand why the robustness drops after conducting adve...
research
08/30/2021

Adaptive perturbation adversarial training: based on reinforcement learning

Adversarial training has become the primary method to defend against adv...
research
03/29/2021

ZeroGrad : Mitigating and Explaining Catastrophic Overfitting in FGSM Adversarial Training

Making deep neural networks robust to small adversarial noises has recen...
research
02/06/2023

Exploring and Exploiting Decision Boundary Dynamics for Adversarial Robustness

The robustness of a deep classifier can be characterized by its margins:...
research
06/12/2023

How robust accuracy suffers from certified training with convex relaxations

Adversarial attacks pose significant threats to deploying state-of-the-a...

Please sign up or login with your details

Forgot password? Click here to reset