Log In Sign Up

Confidence-Calibrated Adversarial Training: Towards Robust Models Generalizing Beyond the Attack Used During Training

by   David Stutz, et al.

Adversarial training is the standard to train models robust against adversarial examples. However, especially for complex datasets, adversarial training incurs a significant loss in accuracy and is known to generalize poorly to stronger attacks, e.g., larger perturbations or other threat models. In this paper, we introduce confidence-calibrated adversarial training (CCAT) where the key idea is to enforce that the confidence on adversarial examples decays with their distance to the attacked examples. We show that CCAT preserves better the accuracy of normal training while robustness against adversarial examples is achieved via confidence thresholding. Most importantly, in strong contrast to adversarial training, the robustness of CCAT generalizes to larger perturbations and other threat models, not encountered during training. We also discuss our extensive work to design strong adaptive attacks against CCAT and standard adversarial training which is of independent interest. We present experimental results on MNIST, SVHN and Cifar10.


page 1

page 2

page 3

page 4


On Norm-Agnostic Robustness of Adversarial Training

Adversarial examples are carefully perturbed in-puts for fooling machine...

Calibrated Adversarial Training

Adversarial training is an approach of increasing the robustness of mode...

Gradient Band-based Adversarial Training for Generalized Attack Immunity of A3C Path Finding

As adversarial attacks pose a serious threat to the security of AI syste...

On the Effectiveness of Adversarial Training against Backdoor Attacks

DNNs' demand for massive data forces practitioners to collect data from ...

Constant Random Perturbations Provide Adversarial Robustness with Minimal Effect on Accuracy

This paper proposes an attack-independent (non-adversarial training) tec...

A2: Efficient Automated Attacker for Boosting Adversarial Training

Based on the significant improvement of model robustness by AT (Adversar...

Adaptive versus Standard Descent Methods and Robustness Against Adversarial Examples

Adversarial examples are a pervasive phenomenon of machine learning mode...