Adversarial Training with Rectified Rejection

by   Tianyu Pang, et al.

Adversarial training (AT) is one of the most effective strategies for promoting model robustness, whereas even the state-of-the-art adversarially trained models struggle to exceed 60 additional data, which is far from practical. A natural way to break this accuracy bottleneck is to introduce a rejection option, where confidence is a commonly used certainty proxy. However, the vanilla confidence can overestimate the model certainty if the input is wrongly classified. To this end, we propose to use true confidence (T-Con) (i.e., predicted probability of the true class) as a certainty oracle, and learn to predict T-Con by rectifying confidence. We prove that under mild conditions, a rectified confidence (R-Con) rejector and a confidence rejector can be coupled to distinguish any wrongly classified input from correctly classified ones, even under adaptive attacks. We also quantify that training R-Con to be aligned with T-Con could be an easier task than learning robust classifiers. In our experiments, we evaluate our rectified rejection (RR) module on CIFAR-10, CIFAR-10-C, and CIFAR-100 under several attacks, and demonstrate that the RR module is well compatible with different AT frameworks on improving robustness, with little extra computation.


page 1

page 2

page 3

page 4


Boosting Adversarial Training with Hypersphere Embedding

Adversarial training (AT) is one of the most effective defenses to impro...

Robust Local Features for Improving the Generalization of Adversarial Training

Adversarial training has been demonstrated as one of the most effective ...

Why adversarial training can hurt robust accuracy

Machine learning classifiers with high test accuracy often perform poorl...

Robust or Private? Adversarial Training Makes Models More Vulnerable to Privacy Attacks

Adversarial training was introduced as a way to improve the robustness o...

Get Fooled for the Right Reason: Improving Adversarial Robustness through a Teacher-guided Curriculum Learning Approach

Current SOTA adversarially robust models are mostly based on adversarial...

Estimating Confidence of Predictions of Individual Classifiers and Their Ensembles for the Genre Classification Task

Genre identification is a subclass of non-topical text classification. T...

Code Repositories


Improving adversarial robustness by a coupling rejection strategy

view repo