Adversarial Training with Rectified Rejection

05/31/2021
by   Tianyu Pang, et al.
0

Adversarial training (AT) is one of the most effective strategies for promoting model robustness, whereas even the state-of-the-art adversarially trained models struggle to exceed 60 additional data, which is far from practical. A natural way to break this accuracy bottleneck is to introduce a rejection option, where confidence is a commonly used certainty proxy. However, the vanilla confidence can overestimate the model certainty if the input is wrongly classified. To this end, we propose to use true confidence (T-Con) (i.e., predicted probability of the true class) as a certainty oracle, and learn to predict T-Con by rectifying confidence. We prove that under mild conditions, a rectified confidence (R-Con) rejector and a confidence rejector can be coupled to distinguish any wrongly classified input from correctly classified ones, even under adaptive attacks. We also quantify that training R-Con to be aligned with T-Con could be an easier task than learning robust classifiers. In our experiments, we evaluate our rectified rejection (RR) module on CIFAR-10, CIFAR-10-C, and CIFAR-100 under several attacks, and demonstrate that the RR module is well compatible with different AT frameworks on improving robustness, with little extra computation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/20/2020

Boosting Adversarial Training with Hypersphere Embedding

Adversarial training (AT) is one of the most effective defenses to impro...
research
11/23/2020

Learnable Boundary Guided Adversarial Training

Previous adversarial training raises model robustness under the compromi...
research
05/24/2023

Robust Classification via a Single Diffusion Model

Recently, diffusion models have been successfully applied to improving a...
research
03/03/2022

Why adversarial training can hurt robust accuracy

Machine learning classifiers with high test accuracy often perform poorl...
research
05/17/2023

Raising the Bar for Certified Adversarial Robustness with Diffusion Models

Certified defenses against adversarial attacks offer formal guarantees o...
research
10/30/2021

Get Fooled for the Right Reason: Improving Adversarial Robustness through a Teacher-guided Curriculum Learning Approach

Current SOTA adversarially robust models are mostly based on adversarial...
research
06/15/2022

Estimating Confidence of Predictions of Individual Classifiers and Their Ensembles for the Genre Classification Task

Genre identification is a subclass of non-topical text classification. T...

Please sign up or login with your details

Forgot password? Click here to reset