Adversarial Robustness on In- and Out-Distribution Improves Explainability

03/20/2020
by   Maximilian Augustin, et al.
4

Neural networks have led to major improvements in image classification but suffer from being non-robust to adversarial changes, unreliable uncertainty estimates on out-distribution samples and their inscrutable black-box decisions. In this work we propose RATIO, a training procedure for Robustness via Adversarial Training on In- and Out-distribution, which leads to robust models with reliable and robust confidence estimates on the out-distribution. RATIO has similar generative properties to adversarial training so that visual counterfactuals produce class specific features. While adversarial training comes at the price of lower clean accuracy, RATIO achieves state-of-the-art l_2-adversarial robustness on CIFAR10 and maintains better clean accuracy.

READ FULL TEXT

page 8

page 12

page 25

page 29

page 32

page 33

page 36

page 41

research
06/16/2019

Interpolated Adversarial Training: Achieving Robust Neural Networks without Sacrificing Accuracy

Adversarial robustness has become a central goal in deep learning, both ...
research
06/27/2023

DSRM: Boost Textual Adversarial Training with Distribution Shift Risk Minimization

Adversarial training is one of the best-performing methods in improving ...
research
05/26/2019

Robust Classification using Robust Feature Augmentation

Existing deep neural networks, say for image classification, have been s...
research
03/18/2020

Improving Adversarial Robustness Through Progressive Hardening

Adversarial training (AT) has become a popular choice for training robus...
research
11/30/2021

Pyramid Adversarial Training Improves ViT Performance

Aggressive data augmentation is a key component of the strong generaliza...
research
04/19/2021

Improving Adversarial Robustness Using Proxy Distributions

We focus on the use of proxy distributions, i.e., approximations of the ...
research
08/19/2022

DAFT: Distilling Adversarially Fine-tuned Models for Better OOD Generalization

We consider the problem of OOD generalization, where the goal is to trai...

Please sign up or login with your details

Forgot password? Click here to reset