Provable defenses against adversarial examples via the convex outer adversarial polytope

11/02/2017
by   J. Zico Kolter, et al.
0

We propose a method to learn deep ReLU-based classifiers that are provably robust against norm-bounded adversarial perturbations (on the training data; for previously unseen examples, the approach will be guaranteed to detect all adversarial examples, though it may flag some non-adversarial examples as well). The basic idea of the approach is to consider a convex outer approximation of the set of activations reachable through a norm-bounded perturbation, and we develop a robust optimization procedure that minimizes the worst case loss over this outer region (via a linear program). Crucially, we show that the dual problem to this linear program can be represented itself as a deep network similar to the backpropagation network, leading to very efficient optimization approaches that produce guaranteed bounds on the robust loss. The end result is that by executing a few more forward and backward passes through a slightly modified version of the original network (though possibly with much larger batch sizes), we can learn a classifier that is provably robust to any norm-bounded adversarial attack. We illustrate the approach on a toy 2D robust classification task, and on a simple convolutional architecture applied to MNIST, where we produce a classifier that provably has less than 8.4 norm less than ϵ = 0.1. This represents the largest verified network that we are aware of, and we discuss future challenges in scaling the approach to much larger domains.

READ FULL TEXT
research
05/31/2018

Scaling provable adversarial defenses

Recent work has developed methods for learning deep network classifiers ...
research
05/27/2019

Provable robustness against all adversarial l_p-perturbations for p≥ 1

In recent years several adversarial attacks and defenses have been propo...
research
10/03/2022

MultiGuard: Provably Robust Multi-label Classification against Adversarial Examples

Multi-label classification, which predicts a set of labels for an input,...
research
10/22/2018

Cost-Sensitive Robustness against Adversarial Examples

Several recent works have developed methods for training classifiers tha...
research
04/10/2020

Luring of Adversarial Perturbations

The growing interest for adversarial examples, i.e. maliciously modified...
research
10/25/2018

Evading classifiers in discrete domains with provable optimality guarantees

Security-critical applications such as malware, fraud, or spam detection...
research
03/20/2019

Provable Certificates for Adversarial Examples: Fitting a Ball in the Union of Polytopes

We propose a novel method for computing exact pointwise robustness of de...

Please sign up or login with your details

Forgot password? Click here to reset