Confidence-Calibrated Adversarial Training: Towards Robust Models Generalizing Beyond the Attack Used During Training

10/14/2019
by   David Stutz, et al.
0

Adversarial training is the standard to train models robust against adversarial examples. However, especially for complex datasets, adversarial training incurs a significant loss in accuracy and is known to generalize poorly to stronger attacks, e.g., larger perturbations or other threat models. In this paper, we introduce confidence-calibrated adversarial training (CCAT) where the key idea is to enforce that the confidence on adversarial examples decays with their distance to the attacked examples. We show that CCAT preserves better the accuracy of normal training while robustness against adversarial examples is achieved via confidence thresholding. Most importantly, in strong contrast to adversarial training, the robustness of CCAT generalizes to larger perturbations and other threat models, not encountered during training. We also discuss our extensive work to design strong adaptive attacks against CCAT and standard adversarial training which is of independent interest. We present experimental results on MNIST, SVHN and Cifar10.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/15/2019

On Norm-Agnostic Robustness of Adversarial Training

Adversarial examples are carefully perturbed in-puts for fooling machine...
research
10/01/2021

Calibrated Adversarial Training

Adversarial training is an approach of increasing the robustness of mode...
research
07/18/2018

Gradient Band-based Adversarial Training for Generalized Attack Immunity of A3C Path Finding

As adversarial attacks pose a serious threat to the security of AI syste...
research
02/22/2022

On the Effectiveness of Adversarial Training against Backdoor Attacks

DNNs' demand for massive data forces practitioners to collect data from ...
research
03/15/2021

Constant Random Perturbations Provide Adversarial Robustness with Minimal Effect on Accuracy

This paper proposes an attack-independent (non-adversarial training) tec...
research
10/07/2022

A2: Efficient Automated Attacker for Boosting Adversarial Training

Based on the significant improvement of model robustness by AT (Adversar...
research
11/09/2019

Adaptive versus Standard Descent Methods and Robustness Against Adversarial Examples

Adversarial examples are a pervasive phenomenon of machine learning mode...

Please sign up or login with your details

Forgot password? Click here to reset