Adversarial Training with Rectified Rejection

05/31/2021
by   Tianyu Pang, et al.
0

Adversarial training (AT) is one of the most effective strategies for promoting model robustness, whereas even the state-of-the-art adversarially trained models struggle to exceed 60 additional data, which is far from practical. A natural way to break this accuracy bottleneck is to introduce a rejection option, where confidence is a commonly used certainty proxy. However, the vanilla confidence can overestimate the model certainty if the input is wrongly classified. To this end, we propose to use true confidence (T-Con) (i.e., predicted probability of the true class) as a certainty oracle, and learn to predict T-Con by rectifying confidence. We prove that under mild conditions, a rectified confidence (R-Con) rejector and a confidence rejector can be coupled to distinguish any wrongly classified input from correctly classified ones, even under adaptive attacks. We also quantify that training R-Con to be aligned with T-Con could be an easier task than learning robust classifiers. In our experiments, we evaluate our rectified rejection (RR) module on CIFAR-10, CIFAR-10-C, and CIFAR-100 under several attacks, and demonstrate that the RR module is well compatible with different AT frameworks on improving robustness, with little extra computation.

READ FULL TEXT

page 1

page 2

page 3

page 4

02/20/2020

Boosting Adversarial Training with Hypersphere Embedding

Adversarial training (AT) is one of the most effective defenses to impro...
09/23/2019

Robust Local Features for Improving the Generalization of Adversarial Training

Adversarial training has been demonstrated as one of the most effective ...
03/03/2022

Why adversarial training can hurt robust accuracy

Machine learning classifiers with high test accuracy often perform poorl...
06/15/2019

Robust or Private? Adversarial Training Makes Models More Vulnerable to Privacy Attacks

Adversarial training was introduced as a way to improve the robustness o...
10/30/2021

Get Fooled for the Right Reason: Improving Adversarial Robustness through a Teacher-guided Curriculum Learning Approach

Current SOTA adversarially robust models are mostly based on adversarial...
06/15/2022

Estimating Confidence of Predictions of Individual Classifiers and Their Ensembles for the Genre Classification Task

Genre identification is a subclass of non-topical text classification. T...

Code Repositories

Rectified-Rejection

Improving adversarial robustness by a coupling rejection strategy


view repo