On the Convergence and Robustness of Adversarial Training

by   Yisen Wang, et al.

Improving the robustness of deep neural networks (DNNs) to adversarial examples is an important yet challenging problem for secure deep learning. Across existing defense techniques, adversarial training with Projected Gradient Decent (PGD) is amongst the most effective. Adversarial training solves a min-max optimization problem, with the inner maximization generating adversarial examples by maximizing the classification loss, and the outer minimization finding model parameters by minimizing the loss on adversarial examples generated from the inner maximization. A criterion that measures how well the inner maximization is solved is therefore crucial for adversarial training. In this paper, we propose such a criterion, namely First-Order Stationary Condition for constrained optimization (FOSC), to quantitatively evaluate the convergence quality of adversarial examples found in the inner maximization. With FOSC, we find that to ensure better robustness, it is essential to use adversarial examples with better convergence quality at the later stages of training. Yet at the early stages, high convergence quality adversarial examples are not necessary and may even lead to poor robustness. Based on these observations, we propose a dynamic training strategy to gradually increase the convergence quality of the generated adversarial examples, which significantly improves the robustness of adversarial training. Our theoretical and empirical results show the effectiveness of the proposed method.


page 1

page 2

page 3

page 4


Adversarial Distributional Training for Robust Deep Learning

Adversarial training (AT) is among the most effective techniques to impr...

Convergence of Adversarial Training in Overparametrized Networks

Neural networks are vulnerable to adversarial examples, i.e. inputs that...

Understanding Adversarial Training: Increasing Local Stability of Neural Nets through Robust Optimization

We propose a general framework for increasing local stability of Artific...

NoiLIn: Do Noisy Labels Always Hurt Adversarial Training?

Adversarial training (AT) based on minimax optimization is a popular lea...

Gradient-Guided Dynamic Efficient Adversarial Training

Adversarial training is arguably an effective but time-consuming way to ...

Robust Deep Learning as Optimal Control: Insights and Convergence Guarantees

The fragility of deep neural networks to adversarially-chosen inputs has...

Probabilistic Categorical Adversarial Attack Adversarial Training

The existence of adversarial examples brings huge concern for people to ...

Please sign up or login with your details

Forgot password? Click here to reset