DeepAI AI Chat
Log In Sign Up

Adversarial Training with Rectified Rejection

by   Tianyu Pang, et al.

Adversarial training (AT) is one of the most effective strategies for promoting model robustness, whereas even the state-of-the-art adversarially trained models struggle to exceed 60 additional data, which is far from practical. A natural way to break this accuracy bottleneck is to introduce a rejection option, where confidence is a commonly used certainty proxy. However, the vanilla confidence can overestimate the model certainty if the input is wrongly classified. To this end, we propose to use true confidence (T-Con) (i.e., predicted probability of the true class) as a certainty oracle, and learn to predict T-Con by rectifying confidence. We prove that under mild conditions, a rectified confidence (R-Con) rejector and a confidence rejector can be coupled to distinguish any wrongly classified input from correctly classified ones, even under adaptive attacks. We also quantify that training R-Con to be aligned with T-Con could be an easier task than learning robust classifiers. In our experiments, we evaluate our rectified rejection (RR) module on CIFAR-10, CIFAR-10-C, and CIFAR-100 under several attacks, and demonstrate that the RR module is well compatible with different AT frameworks on improving robustness, with little extra computation.


page 1

page 2

page 3

page 4


Boosting Adversarial Training with Hypersphere Embedding

Adversarial training (AT) is one of the most effective defenses to impro...

Learnable Boundary Guided Adversarial Training

Previous adversarial training raises model robustness under the compromi...

Robust Classification via a Single Diffusion Model

Recently, diffusion models have been successfully applied to improving a...

Why adversarial training can hurt robust accuracy

Machine learning classifiers with high test accuracy often perform poorl...

Raising the Bar for Certified Adversarial Robustness with Diffusion Models

Certified defenses against adversarial attacks offer formal guarantees o...

Code Repositories


Improving adversarial robustness by a coupling rejection strategy

view repo